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In 1960, the mathematician Ernst Specker described a simple example of nonclassical correla- 
tions, the counterintuitive features of which he dramatized using a parable about a seer who sets 
an impossible prediction task to his daughter's suitors. We revisit this example here, using it as 
an entree to three central concepts in quantum foundations: contextuality, Bell-nonlocality, and 
complementarity. Specifically, we show that Specker's parable offers a narrative thread that weaves 
together a large number of results, including: the impossibility of measurement-noncontextual and 
outcome-deterministic ontological models of quantum theory (the 1967 Kochen-Specker theorem), 
in particular the recent state-specific pentagram proof of Klyachko; the impossibility of Bell-local 
models of quantum theory (Bell's theorem), especially the proofs by Mermin and Hardy and ex- 
tensions thereof; the impossibility of a preparation-noncontextual ontological model of quantum 
theory; and the existence of triples of positive operator valued measures (POVMs) that can be 
measured jointly pairwise but not triplewise. Along the way, several novel results are presented, 
including: a generalization of a theorem by Fine connecting the existence of a joint distribution 
over outcomes of counterfactual measurements to the existence of a measurement-noncontextual 
and outcome-deterministic ontological model; a generalization of Klyachko's proof of the Kochen- 
Specker theorem from pentagrams to a family of star polygons; a proof of the Kochen-Specker 
theorem in the style of Hardy's proof of Bell's theorem (i.e., one that makes use of the failure of 
the transitivity of implication for counterfactual statements); a categorization of contextual and 
Bell-nonlocal correlations in terms of frustrated networks; a derivation of a new inequality testing 
preparation noncontextuality; and lastly, some novel results on the joint measurability of POVMs 
and the question of whether these can be modeled noncontextually. Finally, we emphasize that 
Specker's parable of the over-protective seer provides a novel type of foil to quantum theory, chal- 
lenging us to explain why the particular sort of contextuality and complementarity embodied therein 
does not arise in a quantum world. 
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I. INTRODUCTION 

In the field of quantum foundations, the mathemati- 
cian Ernst Specker is rightly famous for introducing, with 
co-author Simon Kochen, the concept of a noncontextual 
hidden variable model and proving that such a model can- 
not underly quantum theory. This 1967 result, known 
as the Kochen-Specker theorem [ljj, continues to be an 
active subject of research today (see Ref. Q for a bibli- 
ography). One finds precursors to this result in the 1957 
work of Gleason Q and Bell's 1966 review article on hid- 
den variable models (which refers to Gleason's result) Q , 
but also in a 1960 paper by Specker entitled "The logic of 
propositions that are not simultaneously decidable" ^ 



This article studied logical features of quantum theory, 
in particular the question of the consistency of counter- 
factual propositions concerning the values of observablcs 
that are not comeasurabl<fl One of the points of the 
paper was to show that it is possible to conceive of an 
implication relation that is not transitive. 

The idea is illustrated with a parable wherein an over- 
protective seer sets a simple prediction task to his daugh- 
ter's suitors. The challenge cannot be met because the 
seer asks the suitors for a noncontextual assignment of 
values but measures a system for which the statistics are 
inconsistent with such an assignment. The present ar- 
ticle considers the parable anew and seeks to connect 
it with modern developments in quantum foundations. 
In particular, we explore the extent to which the sorts 
of correlations instantiated in the seer's prediction game 
can be achieved in a quantum world. Although the pre- 
cise correlations that are required by the seer do not oc- 
cur in quantum theory, the prediction game is found to 
be a good pump for quantum intuitions. It leads quite 
naturally to proofs of nonlocality and contextuality, to 
a novel kind of complementarity and to the notion of 
stronger-than-quantum correlations. Indeed, it provides 
a narrative thread that is able to weave together a great 
number of important modern results. That so much can 
be gleaned from this little prediction game is a testament 
to the depth of Specker's work. We offer this article as a 
small tribute to him on the occasion of his 90th birthday. 



A. The parable of the over-protective seer 

We begin by reproducing Specker's parable of the over- 
protective seeJl, with clarifications by us in square brack- 
ets: 

At the Assyrian School of Prophets in 
Arba'ilu in the time of King Asarhaddon 
[(681-669 BCE)], there taught a seer from 
Nineva. He was a distinguished representa- 
tive of his faculty (eclipses of the sun and 
moon) and aside from the heavenly bod- 
ies, his interest was almost exclusively in his 
daughter. His teaching success was limited; 
the subject proved to be dry and required 
a previous knowledge of mathematics which 



1 The 1967 Kochen-Specker theorem [lj improves upon many of 
these earlier results by making use of a finite set of obscrvables. It 



should be noted, however, that Bell's 1964 proof [f| of quantum 
nonlocality is also a proof of contextuality using only a finite 
set of obscrvables; unlike the Kochen-Specker proof, it is state- 
specific, the first example of this kind. 

Specker did not use the modern term "countcrfactual" , but in- 
stead referred to "infuturabilitics" , which had been discussed in 
a scholastic context in connection with the problem of whether 
God's omniscience extended to knowing the truths of proposi- 
tions concerning what would have occurred if some event which 
did not happen had in fact happened. 

Our translation is an amalgam of those provided by Stairs Q 
and Seevinck 0. 
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was scarcely available. If he did not find the 
student interest which he desired in class, he 
did find it elsewhere in overwhelming mea- 
sure. His daughter had hardly reached a mar- 
riageable age when he was flooded with re- 
quests for her hand from students and young 
graduates. And though he did not believe 
that he would always have her by his side, 
she was in any case still too young and her 
suitors in no way worthy. In order that the 
suitors might convince themselves of their un- 
worthiness, he promised them that she would 
be wed to the one who could solve a predic- 
tion task that was posed to them. 

Each suitor was taken before a table on 
which three little boxes stood in a row, [each 
of which might or might not contain a gem], 
and was asked to predict which of the boxes 
contained a gem and which did not. But no 
matter how many times they tried, it seemed 
impossible to succeed in this task. After each 
suitor had made his prediction, he was or- 
dered by the father to open any two boxes 
which he had predicted to be both empty or 
any two boxes which he had predicted to be 
both full [in accordance with whether he had 
predicted there to be at most one gem among 
the three boxes, or at least two gems, respec- 
tively]. But it always turned out that one 
contained a gem and the other one did not, 
and furthermore the stone was sometimes in 
the first and sometimes in the second of the 
boxes that were opened. But how can it be 
possible, given three boxes, to neither be able 
to pick out two as empty nor two as full? 

The daughter would have remained un- 
married until the father's death, if not for 
the fact that, after the prediction of the son 
of a prophet [whom she fancied] , she quickly 
opened two boxes herself, one of which had 
been indicated to be full and the other empty, 
and the suitor's prediction [for these two 
boxes] was found, in this case, to be correct. 
Following the weak protest of her father that 
he had wanted two other boxes opened, she 
tried to open the third. But this proved im- 
possible whereupon the father grudgingly ad- 
mitted that the prediction, being unfalsificd, 
was valid. [The daughter and the suitor were 
married and lived happily ever after.] 



B. Contextuality and Complementarity 

Specker's parable presents us with apparently impossi- 
ble correlations; as he says "But how can it be possible, 
given three boxes, to neither be able to pick out two 
as empty nor two as full?" Indeed, if a suitor reasons 



classically, then he expects that even if he chooses a con- 
figuration of gems at random from among the eight pos- 
sibilities, it will be the true configuration one time out 
of eight, and when he opens two boxes he has marked 
both empty or both full, his prediction will be found to 
be correct one time out of four. The fact that no suitor 
manages to succeed after many trials suggests that this 
reasoning must be flawed and that whichever two boxes 
are opened, one will be found full and the other empty. 
Such correlations are contextual in the sense that if one 
wishes to explain the measurements (opening a box) as 
revealing a pre-existing property, then one must imag- 
ine that the outcome of a measurement (or equivalently, 
the property that is measured) is context-dependent - 
whether a gem is seen or not in the first box depends on 
whether that box was opened together with the second 
or together with the third. The seer's challenge cannot 
be met by the suitors because he asks them for a non- 
contextual assignment of outcomes (i.e. a specification of 
whether a gem will be found or not in each box, indepen- 
dent of which other box is opened with it) but measures a 
system for which the statistics are inconsistent with such 
an assignment 

To imagine a world wherein the parable might occur, 
Specker must effectively posit the existence of a system 
that exhibits a particular kind of complementarity: the 
system must be such that three distinct measurements 
can implemented upon it, any pair of which can be mea- 
sured jointly, but where a joint measurement of all three 
is not possible. To see this, one need only note that if 
all three binary-outcome measurements could be imple- 
mented jointly, some pair would necessarily be found to 
have correlated outcomes. 

We now ask the obvious question: Can the parable be 
implemented in quantum theory? The reader is urged to 
pause and give this question some thought before reading 
on. 

There is of course a trivial sense in which the para- 
ble can be implemented in a quantum world, namely the 
same way that it can be implemented in a classical world: 
through a hidden mechanism under the seer's table and 
under his control, which inserts and removes gems from 
the closed boxes at his will. Such a mechanism would 
allow the seer to enforce complementarity and contextu- 
ality "by hand" , so to speak. However this is clearly not 
what Specker had in mind, because had that been the 
case, the seer would not have been so easily stymied by 
his daughter's trick, as there would have been no reason 
why the third box could not have been opened. Rather, 
the seer seems to be in possession of a set of "magic" 
boxes that have particular, rather than arbitrary, corre- 
lations. Thus in asking the question whether the parable 



4 Because the suitors do not fathom this possibility, they are led to 
interpret their consistent failure to provide a correct prediction 
as a confirmation of the seer's assessment of their worth. It is 
in this sense that the seer's task is devised "[i]n order that the 
suitors might convince themselves of their unworthiness" . 
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can be implemented in quantum theory, we mean: docs 
quantum theory allow for this sort of "magic" , which 
would be truly surprising for a naive suitor familiar only 
with classical theories, which do not incorporate contex- 
tuality and complementarity at a fundamental level? 

Certainly, both complementarity and contextuality are 
required at a fundamental level in quantum theory - mea- 
surements that cannot be implemented jointly, and cor- 
relations that cannot be explained by noncontextual pre- 
existing properties (see Ref. Q for a review). But what 
about the particular correlations of the Specker parable? 
To get this kind of contextuality, it is necessary to find a 
situation wherein there are very specific sorts of limita- 
tions on joint measurability — there must exist a triple 
of measurements that can only be implemented jointly in 
pairs. For projective measurements in quantum theory, 
this sort of limitation on joint measurability does not oc- 
cur. The reason is as follows. Two Hcrmitian operators 
can be jointly measured if and only they are jointly diag- 
onalizable. But if we have three Hermitian operators A\, 
A2, and A3, and each pair of operators is jointly diagonal- 
izable, then all three are jointly diagonalizable. This is 
true for any number of Hcrmitian operators — one can 
implement all jointly if and only if one can implement 
every pair jointly. 

Nonetheless, one can imagine modifying the parable 
in various different ways to obtain something for which 
an analogue can be found in quantum theory, and these 
different modifications are the topics of the different sec- 
tions of our article. In the following we outline each of 
them in turn. 



C. Outline 

We begin by providing, in Sec. [ill a formalization of 
the original parable as well as some refinements and elab- 
orations thereof, together with definitions of the key con- 
cepts. We then present the four different themes inspired 
by the parable, with an interlude on frustrated networks. 

A double-query n-box system allowing only ad- 
jacent queries (Sec. IllXf) . The seer could have a set 
of n boxes, arranged in a ring, for which only adjacent 
pairs of boxes can be opened jointly. For n odd, classi- 
cal intuition leads one to expect that there must exist at 
least one adjacent pair of boxes that are either both full 
or both empty, but we can imagine that the seer has a 
special system wherein, regardless of which adjacent pair 
of boxes is opened, it is always the case that one is found 
full and the other empty. The n = 3 case, which corre- 
sponds to the original parable, is exceptional because the 
adjacent pairs constitute all the pairs. For n > 3, this 
is not the case, and so there is no longer any obstacle to 
finding a set of projective measurements that have the 
same pattern of joint measurability as these boxes. In- 
deed, one can find such sets. There are then two ways of 
trying to obtain a quantum analogue of the new parable. 

i) Klyachko's proof of contextuality. Find a quantum 



state that yields a nonzero probability of anti-correlation 
for every adjacent pair. When the overall probability is 
higher than one could account for classically, we arrive 
at a Klyachko-type proof of quantum contextuality Q . 

ii) A new variant of Klyachko 's proof of contextuality. 
Find a quantum state that supports the implication from 
one outcome to the opposite outcome for every adjacent 
pair in the ring and that assigns a non-zero probability 
to the first outcome in the sequence of inferences. In 
conjunction with the transitivity of implication (a conse- 
quence of noncontextuality) , and the fact that the ring 
contains an odd number of boxes, this gives rise to a 
contradiction, thereby demonstrating the contextuality 
of quantum theory. 

A separated pair of single-query 3-box systems 
(Sec. IIV|) . One can imagine that the seer's three-box 
system is such that only a single box (rather than a pair 
of boxes) can be opened at any given time, but that it 
is possible to prepare a pair of three-box systems such 
that by opening a single box on each element of the pair, 
one reproduces the seer's correlations. Specifically, if the 
same box is opened on each member of the pair, they 
are always found to be both full or both empty, while 
if different boxes are opened on the two systems, one is 
always found full and the other empty. (Classically, one 
would expect that some pair of boxes on a given wing 
arc both full or both empty, and by the assumed perfect 
correlation between the wings, the same pair is similarly 
configured on the other wing, implying that it is impossi- 
ble to open different boxes on the two systems and always 
find anti-correlation rather than correlation.) Here, we 
are postulating six distinct measurements (three on each 
wing) only certain pairs of which can be implemented 
jointly, namely, pairs that have one member from each 
wing. So again, there is no obstacle to finding a set of pro- 
jectors having this pattern of joint measurability. There 
are once again two ways of trying to obtain a quantum 
analogue of the new parable. 

i) Mermin's proof of Bell-nonlocality. Find a quantum 
state that yields perfect correlation when the same mea- 
surement is implemented on the two wings. Demonstrate 
that the extent to which it can yield anti-correlation when 
different boxes are opened is greater than is possible in a 
Bell-local model [Io|. 

ii) Hardy's proof of Bell-nonlocality. Find a chain 
of choices of measurement, alternating between the two 
parties, and find a quantum state that yields implica- 
tions connecting particular outcomes of all but one mea- 
surement within this chain. Demonstrate that there is a 
nonzero probability for the kind of correlation exhibited 
by the last pair in the chain to be opposite to what one 
would expect by the transitivity of implication [Tl| . 

We also consider generalizations of these nonlocality 
proofs to rings of n measurements where only adjacent 
members can be implemented jointly. 

Interlude on frustrated networks (Sec. [V]). By 
representing correlations between binary-valued observ- 
ables by frustrated networks, we provide a simple cat- 
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cgorization of some of the contextual and Bell-nonlocal 
correlations outlined above. 

A diachronic pair of single-queries of a 3-box 
system fSec. lVl]) . In this case, the seer's three-box sys- 
tem is modified so that only a single box can be opened 
at any given time, but that it is possible to implement 
two consecutive measurements in such a way that if the 
same box is opened at the two times, then the result of 
the measurement is always reproduced faithfully, while 
if different boxes arc opened at the two times, then the 
results are always different. In addition, we impose the 
constraint that no measurement at the second time can 
yield any information about the choice of the measure- 
ment at the first time. 

Now it is natural for a suitor to assume that sta- 
tistical indistinguishability among a set of choices im- 
plies that they are equivalent at the level of an onto- 
logical model. This assumption is known as prepara- 
tion noncontextuality [l2j . It can be shown that no such 
preparation-noncontcxtual model can reproduce the di- 
achronic (two-time) correlations stated above. But in 
quantum mechanics (which violates preparation noncon- 
textuality ITj ) , there are sets of measurements for which 
these correlations can be approximated even though the 
quantum state after the first measurement reveals no in- 
formation about the identity of this measurement. 

Joint measurability of POVMs (Sec. I\TT|) . A 
final path to a quantum analogue of the overprotective 
seer (OS) parable is to ignore the counter-intuitive corre- 
lations, and rather concentrate on the complementarity 
exhibited by the three boxes. As discussed above, the 
pair wise but not triplewise joint measurability of three 
observables cannot exist in quantum mechanics for tra- 
ditional (projective) measurements. However, this does 
not rule out the possibility that there exists a triple of 
generalized measurements, described by POVMs (posi- 
tive operator valued measures), that can be jointly mea- 
sured pairwise but not triplewise. Indeed, we will exhibit 
two specific examples of such a triple of nonprojective 
measurements. This thread connects with some recent 
results on joint measurability of POVMs [l3hl6j. Wc 
demonstrate that this example is not useful for approxi- 
mating the OS correlations, nor for proving the contex- 
tuality of quantum theory. 



II. PRELIMINARIES AND A FORMALIZATION 
OF THE PARABLE 

A. Joint measurability 

We wish to flesh out the original parable by being 
more specific about the nature of the correlations posited 
therein. We shall do this within the context of oper- 
ational theories. This is natural because the OS para- 
ble was originally presented by Spccker as a "toy the- 
ory" with similarities to quantum theory, not as a sce- 
nario that arises within quantum theory. Wc thus need a 



unified framework to compare the OS theory both with 
quantum theory, and with classical theories (i.e. theo- 
ries without contextuality or complementarity). Also, to 
make the most of the OS parable we need to embellish the 
narrative (in a formal way) by adding extra assumptions, 
and this requires considering measurements and prepa- 
rations beyond those discussed by Speckcr. Finally, we 
note that in the fields of quantum foundations and quan- 
tum information, there is currently considerable interest 
in operational "foil" theories such as Popescu-Rohrlich 
(PR) boxes [ItJ and the toy-bit theory |18j . 

An operational theory is one that specifies the prob- 
abilities of each possible outcome X of each possible 
measurement procedure M given each possible prepa- 
ration procedure P. Wc denote these probabilities by 
p(X\M;P). It will be important for the later discussion 
of contextuality to distinguish between a measurement 
procedure M , which is a specification of a list of instruc- 
tions of what to do in the laboratory, and an equivalence 
class M. of measurement procedures, where two proce- 
dures are equivalent if they yield the same statistics for 
all preparation procedures. For instance, the equivalence 
class associated with a particular measurement procedure 
Mi is 

Mi={M | VP :p(X\M;P) = p(X\Mi;P)}. (1) 

We will refer to this equivalence relation over procedures 
as operational equivalence. We will refer to the equiva- 
lence classes as simply measurements, and denote them 
by calligraphic font, while the measurement procedures 
will be denoted by italic font. Similarly, we define equiv- 
alence classes of preparation procedures. For instance, 
the equivalence class associated with a particular prepa- 
ration procedure Pi is 

Vi= {P | VM : p(X\M; P) = p(X\M; Pi)} . (2) 

Given that probabilities of outcomes of measurements 
depend only on the equivalence classes of the preparation 
and the measurement procedures, we typically condition 
only on the latter and write p(X\M;V). 

We begin by providing an operational definition of joint 
measurabilitjlj. We consider only measurements with a 
discrete set of outcomes. 

Joint measurability of a set of N measurements can be 
defined (recursively) as follows. 

Definition 1 (joint measurability). A set of N measure- 
ments {.Mi, Mn} is jointly measurable if there 
exists a measurement M. with the following features: (i) 
The outcome set of M. is the Cartesian product of the 
outcome sets of {Mi, M2, Mn} cind (ii) the out- 
come distributions for every joint measurement of a sub- 
set {Assise S} C {Mi, M2, Mn} are recovered as 



This definition is a generalization of the notion of coexistence of 
quantum observables provided in Ref. [T^l . 
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marginals of the outcome distribution of M for all prepa- 
rations V . Denoting a joint measurement of the subset S 
by Ms, and its outcome by Xs, 

M s = {M s \seS}, X s = {X s \seS} (3) 

the condition can be expressed as 

\/S,\/V:p{X s \M s ;V) = £ p (X u X 2 , X N \M; V) . 

X t :t£S 

(4) 

Definition 2 (n-tuple- wise joint measurability) . A set of 
measurements {Mi,M-z, ■ Mn} is n-tuple-wise jointly 
measurable if every n-element subset (i.e. every n-tuple 
of measurements) is jointly measurable. 

Clearly joint measurability of all n-tuplcs implies joint 
measurability of all (n — l)-tuples, but not vice-versa. 

Finally, we shall sometimes say that the measurements 
in a set {.Mi, M2, Mn} exhibit complementary if 
they are not jointly measurable. 

We can now be precise about the nature of the corre- 
lations in the overprotective seer's prediction game. 

Abstracting from the story of boxes and gems, the 
parable posits that there are three distinct measurement 
procedures, which we shall denote by Mi, M 2 and M 3 
(corresponding to the choice of box). A key assumption 
that is not explicit in Specker's description of the predic- 
tion game is that these three measurement procedures 
are not operationally equivalent. That is, for every pair, 
there is a preparation procedure that distinguishes them, 
that is, some P such that p(X\Mf,P) ^ p(X\M 2 ;P). 
Making this assumption, we see that the game assumes 
the existence of three distinct equivalence classes of mea- 
surement procedures, which we denote by Mi, M2 and 
M3. Furthermore, it is assumed that these are pairwisc 
jointly measurable. It follows that there exist three joint 
measurements, which we shall denote by M12, M13 and 
M23 and which, by virtue of the definition of joint mea- 
surability, must have statistics that reproduce the statis- 
tics of Aii, M2 and M3 as marginals. Note that, as the 
notation suggests, M12, M13 and M23 correspond to 
distinct equivalence classes of measurement procedures, 
a fact that follows from the operational distinguishability 
of 7VI1, M2 and M3. 

Note also that within the equivalence class of measure- 
ment procedures Mi, there are procedures M[ that in- 
volve implementing a joint measurement of A^i and M2 

and discarding the outcome of the M2 measurement, and 

(3) 

there are procedures M\ that involve implementing a 
joint measurement of Mi and M3 and discarding the 
outcome of the M3 measurement. Which of these two 
procedures is implemented may be relevant in a contex- 
tual hidden variable model, as we will see. 

The seer's trick also requires that there is at least one 
preparation, call it V*, that yields perfect negative corre- 
lations for the joint measurement of any pair of Mi, M2 
and M.3- Perfect negative correlation for a single joint 



measurement of Mi and M2 does not imply that one 
must have equal probability for X\ = 0, X2 = 1 and 
Xi = 1, X2 = (the two ways of achieving perfect nega- 
tive correlation). However, this equality does follow from 
demanding perfect negative correlation for all three joint 
measurements, as we show in AppcndixfAJ Consequently, 
the correlations are of the form 

Vz j :p(Xi = 0, X.j = 1 At . .: '/'.: = 1 

p(X i = l,X j = 0\M i , j ;V*) = ^ (5) 

We call these the overprotective seer correlations, or OS 
correlations. Note that it follows from this definition that 
individual measurements have a uniformly random out- 
come, 

p{X t = 0\M i; V*) =p{Xi = l\Mi;V*) = \. (6) 

B. The existence of a joint distribution 

The question of joint measurability concerns what is 
physically possible, not what is logically possible. If a 
physical theory postulates measurements that cannot be 
jointly implemented, it could still be that there is a joint 
probability distribution over the outcomes of these mea- 
surements that yields each measurement's statistics as a 
marginal. 

Definition 3 (existence of a joint distribution). Con- 
sider a set of measurements {A^i, M 2 , Mn} ■ Let S 
be a subset of their indices and denote the joint measure- 
ment associated with this subset by Ms and its outcome 
by Xs, as in Eq. (0). A joint distribution for the mea- 
surements {Mi, M2, ...,Mn} is said to exist if there ex- 
ists a distribution p (Xi...Xn\V) for every preparation V 
such that for any measurement Ms, 

p(X s \Ms;V)= Yl p(Xi...X n \V). (7) 

It is worth noting that within a given theory, the 
nonexistence of a joint distribution for some set of mea- 
surements implies the physical impossibility of a joint 
measurement of these. This follows from the fact that if 
a joint measurement is possible, then there must exist a 
joint distribution over the outcomes. However, the con- 
verse implication need not hold. For instance, there are 
theories, such as the toy theory of Ref. [3 which postu- 
late the physical impossibility of certain joint measure- 
ments, but for which a joint distribution over outcomes 
(effectively a hidden variable model) does exist. 

The feature of the OS correlations that is at the root 
of their peculiarities is the fact that they do not admit 
of a joint distribution. 

Lemma 4 (no joint distribution for OS correlations). 
There is no distribution p(X\, X2, X3) on the three binary 



7 



variables X\ , X 2 and A3 such that the marginals over 
pairs of these are of the form of Eq. 0). 

Proof. There are eight valuations of (Xi, X 2 , X3), of 
the form (0,0,0), (0,0,1), (0,1,0), ... But whichever 
of these valuations is assigned non-zero probability in 
p(Xi, X 2 , X 3 ), one of the three pairs (Xi,X 2 ), (A 1; A 3 ) 
or (X 2 ,X 3 ) will have non-zero probability assigned either 
to the valuation (0,0), "both boxes empty", or to the 
valuation (1,1) "both boxes full". For this pair, either 
p(0, 0) > or p(l, 1) > 0, so that perfect anti-correlation 
is not achieved. □ 

Given the discussion above, this result has immediate 
(negative) consequences for the possibility of implement- 
ing a triple wise joint measurement of .Mi, M 2 and M 3 . 

Corollary 5. Measurements Mi, M 2 and M3 that can 
be pairwise jointly measured and that achieve the OS cor- 
relations of Eq. f5|) cannot be triplewise jointly measured. 



C. Measurement- noncontextual ontological models 

In this article, we will make use of the generalized no- 
tion of noncontextuality introduced in Ref . [121 ] , which is 
operational insofar as it is defined for ontological mod- 
els of any operational theory, not just quantum theory. 
An ontological model of an operational theory specifies: 
(i) a set A of ontic (i.e. real, physical) states A; (ii) 
for each preparation procedure P, a distribution p(X\P) 
describing the probability that the ontic state of the sys- 
tem subsequent to the preparation procedure P is A; (hi) 
for each measurement procedure M , a response function 
p(A|M;A) describing the conditional probability of ob- 
taining outcome X given ontic state A. Finally, one must 
recover the statistics of the operational theory as follows: 



p(x\M;P) = J2p(x\M;X) p (\\p). 



(8) 



Here we have taken A to be a discrete variable for sim- 
plicity. 

An ontological model is said to be measurement- 
noncontextual if any two measurement procedures that 
are operationally equivalent [in the sense of Eq. (p}] arc 
represented similarly in the model: 



VP : p(X\M; P) = p(X\M'; P) 
> VA : p(X\M; A) = p(X\M'; A). 



(9) 



Equivalently, the condition is that the response function 
for a measurement procedure M depends only on its op- 
erational equivalence class M, that is, 



p(X\M;X)=p(X\M;X). 



(10) 



An ontological model is said to be outcome- 
deterministic for a measurement procedure M if the out- 
come is uniquely determined for every ontic state, 



The traditional notion of a noncontextual ontological 
model of quantum theory incorporated both the assump- 
tion of measurement noncontextuality and that of out- 
come determinism f or p rojective measurements. Here, 
we will follow Ref. [12| and distinguish these assump- 
tions so as not to conflate issues about determinism with 
issues about noncontextuality. To avoid terminological 
confusion, we shall say that an ontological model of quan- 
tum theory is traditionally-noncontextual if it is both 
measurement-noncontextual [in the sense of Eq. and 
outcome-deterministic for projective measurements. Any 
proof of the impossibility of a traditionally-noncontextual 
model of quantum theory will be called a proof of the 
Kochen-Specker theorem. 

As it turns out, there is a close connection between the 
existence of a joint distribution and noncontextuality: 

Theorem 6. For a given set of measurements, if 
there exists a measurement-noncontextual and outcome- 
deterministic ontological model then there exists a joint 
distribution for their outcomes. 

The proof is provided in Appendix [Bj This is a slight 
generalization of half of a theorem by Fine [l9[ . Combin- 
ing this theorem with the nonexistence of a joint distri- 
bution for the OS correlations (lemma [4]), we have: 

Corollary 7. There is no measurement-noncontextual 
and outcome- deterministic ontological model of the OS 
correlations of Eq. (5|). 

It is also possible to write down inequalities which must 
be satisfied by the experimental statistics if these are 
to admit of an explanation in terms of a measurement- 
noncontextual and outcome-deterministic model. We 
will call these Kochen-Specker inequalities. For the case 
of the OS correlations, if we imagine such a model 
then each box must be either empty or full. Conse- 
quently, if we choose a pair of boxes uniformly at ran- 
dom at most two of the three pairs could exhibit anti- 
correlation, so that the probability of obtaining anti- 
correlated outcomes is bounded above by 2/3. More pre- 
cisely, iip(Xi ^ Xi®i\Mi,i®i; V) denotes the probability 
of obtaining anti-correlated outcomes in a joint measure- 
ment of Mi and Mi^i, where © denotes addition modulo 
3, then the average probability of success is 



3 1 



i=l 



and it satisfies 



R 3 < < c 



(12) 



(13) 



VA 6 A : p(X\M; A) € {0,1}. 



(11) 



This is a Kochen-Specker inequality. 

It is sometimes useful to express Kochen-Specker in- 
equalities in an algebraic form. We define new variables 
Xi = {-l) x >, so that Xi = +1(-1) when X t = 0(1). 
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Using angle brackets to denote averages, we consider the 
following combination of correlation functions 

S 3 = (X 1 X 2 ) + (X 2 X 3 ) + (X 3 X l ). (14) 

Then the inequality takes the form 

S 3 >S» C = -1, (15) 

The OS correlations, however, require (XiXi®i) = — 1 
for all i and hence 1S3 = —3, clearly violating the bound. 

With the correlations in this form, one can also express 
a proof of the impossibility of an outcome-deterministic 
noncontextual model in the algebraic manner introduced 
by Mermin [2(|. Assuming that X t e {+1,-1} has a 
value independent of context, the OS correlations require 
that these values satisfy the following algebraic relations. 

X X X 2 = -1, (16) 
X X X 3 = -1, 

x 2 x 3 = — 1. 

However, these relations cannot be satisfied because the 
product of the left-hand-sides is XfX^X^ = +1, while 
the product of the right-hand-sides is — 1 . 

Any theory that realizes the OS correlations fails to 
admit of a measurement-noncontextual and outcome- 
deterministic ontological model. However, as explained 
in the introduction, the kind of complementarity one re- 
quires to achieve these correlations - three measurements 
that are pairwise but not triplcwise jointly measurable 
— cannot arise for projective measurements in quantum 
theory. In Sec. IIII( we turn to the modifications of the 
parable that do have a counterpart in quantum theory. 

D. Preparation noncontextuality 

The notion of measurement noncontextuality defined 
in Eq. ^ is motivated by a kind of equivalence principle: 
in the absence of observable differences between measure- 
ment procedures (i.e. differences in their statistics) one 
should not posit differences in their representations in 
the ontological model. In Ref. [l2| it was argued that the 
same principle should lead one to an assumption of non- 
contextuality for preparation procedures. Specifically, an 
ontological model is said to be preparation noncontextual 
if any two preparation procedures that are operationally 
equivalent [in the sense of Eq. (|2[)] are represented equiv- 
alcntly in the model: 

VM : p(X\M; P) = p(X\M; P') 

^V\:p(\\P)=p(\\P'). (17) 

Preparation noncontextuality can also be characterized 
as the condition that the distribution for a preparation 
procedure P depends only on its operational equivalence 
class V, that is, 



Given their similar motivations, someone who en- 
dorses measurement noncontextuality ought also to en- 
dorse preparation noncontextuality just as enthusiasti- 
cally. One should endorse both notions or neither. There- 
fore, it is most natural to ask about the possibility of an 
ontological model that is both preparation-noncontextual 
and measurement-noncontextual. We will call such mod- 
els generalized-noncontextual 0. In this paper, we will 
consider suitors faced with the seer's prediction problem 
who are committed to the kind of equivalence principle 
described above and therefore to generalized noncontex- 
tuality. 

Inequalities that must be satisfied by the experimen- 
tal statistics if these are to admit of a generalized- 
noncontextual model will be called simply noncontextual- 
ity inequalities. Note that our terminology distinguishes 
such inequalities from the Kochcn-Specker inequalities 
of the previous section: Kochen-Specker inequalities ex- 
press constraints on statistics when one assumes outcome 
determinism in addition to measurement noncontextual- 
ity, while noncontextuality inequalities rely on no such 
assumption of determinism. An example of a noncontex- 
tuality inequality will be provided in Sec. I VII 

E. Justifying outcome determinism 

Note that a commitment to the kind of equivalence 
principle described above does not obviously provide any 
grounds for assuming outcome determinism for measure- 
ments, Eq. (|lip. Thus, faced with the OS correlations 
and corollary a suitor might simply deny outcome de- 
terminism to salvage measurement noncontextuality. For 
instance, seeing the correlations in the seer's prediction 
game, a clever suitor might hypothesize that they are 
explained by the following sort of model. There is an 
ontic variable that flags when the preparation V* was 
implemented and if it was, the measurements M.\ 2 , M.\ 3 
and M. 23 each generate the outcomes (0,1) and (1,0) 
uniformly at random. Such an ontological model would 
violate outcome determinism, but would preserve mea- 
surement noncontextuality. 

On the other hand, the assumption of outcome de- 
terminism can sometimes be shown to be a consequence 
of preparation noncontextuality. If such a justification 
is forthcoming, then the OS correlations cannot be ex- 
plained by any ontological model that is generalized- 
noncontextual. For instance, in quantum theory, the 
assumption of outcome determinism for projective mea- 
surements can be derived from preparation noncontex- 
tuality, as shown in Ref. (l2j . Therefore, in quantum 
theory the conjunction of measurement noncontextuality 
and outcome determinism for projective measurements 
- i.e. the assumption of traditional noncontextuality 



p(X\P)=p(X\V). 



(18) 



In Ref [T3l . these were called "universally noncontextual." 
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of an ontological model — is implied by the assumption 
of generalized noncontextuality and all the no-go theo- 
rems for the former are no-go theorems for the latter. 
In Sec. IIII1 we will provide proofs of the failure of tradi- 
tional noncontextuality in quantum theory using a gen- 
eralization of the OS correlations. Given the result just 
mentioned, such proofs also demonstrate the failure of 
generalized noncontextuality. 

Much of this article makes statements about correla- 
tions that are not found in quantum theory but can easily 
be imagined to occur in more general operational theo- 
ries. In such theories, a natural analogue of the notion 
of a projective measurement can be defined. The ques- 
tion thus arises of whether the assumption of preparation 
noncontextuality might imply outcome determinism for 
such measurements for an ontological model of a general 
operational theory. The question is currently open, but 
we conjecture that it has a positive answer. 

Fortunately, we can still draw some negative conclu- 
sions about generalized noncontextuality in operational 
theories without settling this conjecture. Specifically, in 
Sec. lVIl we will demonstrate how a slight modification of 
the seer's game yields a set of correlations that fails to 
admit of a preparation-noncontcxtual ontological model. 

III. NO-GO THEOREMS FOR 
MEASUREMENT-NONCONTEXTUAL AND 
OUTCOME-DETERMINISTIC MODELS 

A. A double-query n-box system allowing only 
adjacent queries 

One way to generalize Specker's parable is to consider 
?i > 3 boxes, and allow only certain pairs to be opened 
jointly. In particular, one can imagine the boxes to be 
arranged in a ring with adjacent pairs being the only 
ones that can be opened jointly. The resulting pattern of 
joint measurability can be reproduced in quantum the- 
ory because there exist ordered sets of n > 3 projectors 
for which adjacent elements commute (where adjacency 
is determined modulo n). If n is odd, then for every 
deterministic and noncontextual assignment of gems to 
boxes that the suitor might make, there must exist at 
least one adjacent pair of boxes that are either both full 
or both empty. Indeed, given any assignment of gems to 
boxes, if we choose an adjacent pair of boxes uniformly at 
random, the probability of obtaining anti-correlated out- 
comes is bounded above by (n — l)/n. We then imagine 
that the seer has a special system such that, regardless 
of which adjacent pair of boxes is opened, it is always the 
case that one is found full and the other empty. Q For 
these correlations, unlike those described in the original 



Note that for even n, one cannot develop an interesting para- 
ble because there are assignments of gems to boxes wherein no 
adjacent pair is both full or both empty. 



parable, one can find a quantum analogue. Although 
this analogue does not allow the seer to always defeat 
the suitor's prediction, the probability of finding perfect 
anti-correlation between a pair of adjacent boxes can be 
greater than the success rate of (n — l)/n expected by 
classical reasoning. 

Let us consider this situation more carefully. We 
are imagining an odd number n > 5 of measurements, 
{.M a |a = 1, ...,n}, such that for all a, Ai a and A4 a &i 
are jointly measurable by a measurement Ai a ,a®\ (here 
© denotes addition modulo n), and that there is at least 
one preparation, call it P*, such that the outcomes of all 
of these pairs of measurements are anti-correlated. By a 
generalization of the argument provided in Appendix [Aj 
the correlations must be of the form 

Va :p{X a = Q,X a@1 = l\M a , a ®i;V*) = 2 

p(X a = 1, X am = 0|M„,„ ffl i;P.) = i (19) 

We will call these the double- query n-box OS correlations. 

By an argument analogous to the one proving lemma 
S] one can show that there is no joint distribution 
over all the X a that reproduces these correlations as 
marginals. It then follows from theorem[5]that there is no 
measurement-noncontextual and outcome-deterministic 
ontological model of these correlations. 

Indeed, if we choose an adjacent pair of boxes uni- 
formly at random, the probability R n of obtaining anti- 
correlated outcomes, 

n 

R n = y^-p(X a ^X a(B1 \M a ,M a(B1 ;V), (20) 
* — ' n 

a=l 

is clearly bounded above, 

Rn<l-- (21) 

n 

(because at most n — 1 pairs can be anti-correlated if n 
is odd). The double-query n-box OS correlations yield 
R n = 1, maximally violating this Kochcn-Specker in- 
equality. 

We may equivalently state the restriction as follows. 
Following the convention established in Sec. Ill CI we de- 
fine X a = (— l) Xa € {+1,-1}. For all measurement- 
noncontextual and deterministic assignments of the value 
X a , at most n — 1 of the pair- wise products can be — 1, 
so that: 

S n = (X 1 X 2 ) + (X 2 X 3 )+- ■ ■ + {X n X 1 ) > -(n-2), (22) 

whereas the double-query n-box OS correlations give 
S n = -n. 

Again, a simple algebraic way of manifesting the 
fact that the double-query n-box correlations do not 
admit of a measurement-noncontextual and outcome- 
deterministic model is that they require X a € {+1,-1} 
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such that 

XxX 2 = -1, 
X2X3 = — 1, 

X n -iX n = —1, 
X n X 1 = -1, (23) 

but the product of the left-hand-sides is XfX^ ■ ■ ■ X% = 
+1, while the product of the right-hand-sides is —1. 

We now consider what values of R n and S n can be 
achieved in quantum theory. 

B. Klyachko's proof of the Kochen-Specker 
theorem 

We require n Hermitian observables X\ , . . . , X n - 
each having eigenvalues and 1 — associated with the 
n measurements Mi, ... , M n . As discussed in Sec. II B\ 
for the specific case of n = 3, the pairwise commutativity 
of Ai, X-2 and X3 implies their triplewise commutativ- 
ity and consequently the existence of a triplewise joint 
measurement and of a measurement-noncontextual and 
outcome-deterministic model@ 

Nonetheless, we can obtain something interesting for 
odd n greater than 3. We begin with the case of n = 5. 
A no-go theorem of this sort has recently been given by 
Klyachko [H| (see also Refs. H and Q). The con- 
struction is as follows. We consider a quantum system 
described by a 3-dimensional Hilbert space, and all of 
the states we consider require only real- valued coefficients 
in some basis. Thus the system can be visualized in 3- 
dimensional Euclidean space. The observables are pro- 
jectors X a = \l a )(la\, where the vectors {\l a ) : a = 1, 5} 
are of the form 

\l a ) = (sm 9 cos ip a , sin # sin iy9 a , cos (9), (24) 

and ip a = so that the sequence of vectors forms a 
pentagram, as in Fig. [T] The angle 9 is chosen such 
that vectors adjacent in the sequence arc orthogonal, 
(la\la®i) = 0, where © denotes sum modulo 5. As a 
result of this orthogonality relation, adjacent observables 
X a , A a 0i are indeed jointly measurable. It is clear that 
such a value of 9 exists because as it varies from to 
the angle between adjacent vectors varies from to 
In fact, orthogonality is achieved at cos 9 = -^=. 

Now consider a preparation of the quantum state \ipi) 
corresponding to the vector lying along the symmetry 



Another way to see that the Kochen-Specker inequality (15) can- 
not be violated in quantum theory is that we can treat S3 as a 
polynomial of commuting variables, and thus its minimum can 
be attained by assigning the value to each variable in a noncon- 
tcxtual and deterministic manner. 




o 

FIG. 1. Quantum states and observables used for Klyachko's 
proof of contextuality and the proof of contextuality via the 
failure of transitivity of implication. 



axis of the pentagram, such that the angle between it and 
each of the \l a ) is 9. In a measurement of any adjacent 
pair of observables X a , X a ^\, either just one of them 
yields the outcome 1, in which case the outcomes are anti- 
correlated, or both yield the outcome 0. The probability 
for anti-correlation is 2 cos 2 (9) = 2/a/5, which implies 
that the Kochen-Specker bound of Eq. (|2"Tj) is violated, 

^ = ^quantum = 2 _ Q ^ 4 ^ 

V5 5 

Equivalently, with the observables X a = 2A" a — 1 , where 
1 is the identity operator, the state achieves £5 = 
5-4^^-3.9443 t -3. 

The value 2 jy/l is in fact the maximum possible quan- 
tum violation of this Kochen-Specker inequality. We 
show this in Appendix [C] with the help from the converg- 
ing hierarchy of semidefinite programming (SDP) tools 
discussed in Ref. [H [see also Eq. flU} and Eq. (J27J) be- 
low]. 

Note that unlike the no-coloring proofs of the Kochen- 
Specker theorem, this is a state-specific proof 0, HH 
In fact, for 3-dimensional quantum state, this is a state- 
specific proof that involves the smallest set of vectors 
{ I L) {satisfying the orthogonality relation (l a \la®i) = 

10 



9 Note also that the claim in Klyachko et al. that the inequality 
in question provides a "test of arbitrary hidden variables model, 
context free or not" is mistaken. If the values assigned to ob- 
servables could be context-dependent, there would be no contra- 
diction. 

The 2- vector and 3- vector cases are trivial. For the 4- vector case, 
note that the orthogonality relation (l a \l a ^l) = implies that 
some pair of non-adjacent vectors must be collinear which, in 
turn, implies that all the four projectors |Zo){Z | must commute, 
and therefore cannot violate any Kochen-Specker inequality. 
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1. Generalization to all odd n. 

One can generalize Klyachko's no-go result to all odd 
n as follows. Define n obscrvables by the projectors onto 
vectors {\l a ) : a = 1, n} defined as in Eq. (f2"4")l but with 
<Pa = I ir" 7ra an d with 9 chosen such that (Z a |'aei) = 
where © denotes sum modulo n. This is achieved when 
cos 2 # = cos(7r/n)/(l + cos(n / n)) . This set of n vectors 
forms what is known as an {n/ 7 - 1 ^-} star polygon (26| . 
The {5/2}, {7/3} and {9/4} star polygons are depicted in 
Fig. [2] Again, preparing the quantum state on the sym- 
metry axis of the star polygon, the probability of anti- 
correlation for adjacent obscrvables violates the Kochen- 
Speckcr bound of Eq. (|2"Tj) with 



2cos(^) 

p pquantum _ v n > 

"~ " ~l + cos(£) 



1 

1 — 

n 



(26) 



or equivalcntly, the Kochen-Spcckcr bound of Eq. ([22 
with 



S n = 5^ uantum = n - 2ni?^ uantum 

4n cos(-) 

= n tJtt 

l + cos(^) 



^ 2 - n. 



(27) 



As with the n = 5 case, these values also represent the 
strongest possible quantum violation of these Kochen- 
Specker inequalities, as is shown in AppendixO At large 
n, the quantum probability approaches unity quadrati- 
cally as 



pquantum ^ 1 _____ 

4n 2 ' 



(28) 



in contrast to the linear approach to unity of the Kochen- 
Specker bound. 




FIG. 2. {n/ 2 — -} star polygons for n = 5, 7, and 9. 

It is worth emphasizing that by using the quantum cor- 
relations for n measurements, the seer can achieve some- 
thing very close to the ends he achieved in the original 
parable. Specifically, the seer can construct a prediction 
game such that suitors who reason classically think the 
game is fair (i.e. they think it is highly likely that some 
suitor will win) when in fact it is not (because classical 
reasoning docs not apply and it is actually highly unlikely 
that any suitor will win). 

The prediction game that meets the seer's ends is as 
follows. The suitor is asked to pick an adjacent pair of 



boxes that he believes to be both empty or both full and 
to open those. If his prediction for those two boxes is cor- 
rect, the suitor wins, otherwise he loses. With what prob- 
ability will a suitor who reasons classically expect to win? 
We presume that he knows the seer to be adversarial and 
so he reasons that the seer has prepared a classical con- 
figuration which makes his [the suitor's] task as difficult 
as possible. He reasons therefore that the configuration 
is one wherein only one adjacent pair of boxes is both 
full or both empty (by his classical lights, he knows that 
there must be at least one such pair for an odd number 
of boxes). Thus the suitor expects his probability of win- 
ning to be the probability that he has guessed correctly 
which of all the n pairs is the correlated one, times the 
probability that he has guessed their contents correctly - 
overall, a probability of l/2n. In fact, the probability of 
the suitor's prediction coming true is only of order 1/n 2 
in the quantum scheme described above. Let us say the 
number of suitors is I, assumed large. Then if the seer 
chooses the number of boxes n such that n -C I <C ?i 2 , 
the suitors believe it to be very likely that one of them 
will win when in fact it is very likely that none of them 
will win. 



C. A proof of the Kochen-Specker theorem based 
on the failure of transitivity of implication 

Specker's intent in introducing his parable was to 
demonstrate the logical possibility of a failure of the tran- 
sitivity of implication. The idea is straightforward. Sup- 
pose si, S2 and S3 are propositions that assert the pres- 
ence of a gem in boxes 1, 2 and 3 respectively, and ->si, 
~^S2 and ~^S3 assert their negations. We have Si =>■ -1S2 
(because boxes 1 and 2 are never found both full), and 
-^S2 ==>■ S3 (because boxes 2 and 3 are never found both 
empty) . If implication were transitive, then we could con- 
clude that si =>■ S3. But in fact we have si -1S3 
(because boxes 1 and 3 are never found both full) . There- 
fore, assuming a gem is sometimes found in box 1, tran- 
sitivity fails. 

Specker's 1960 article was framed within the tradition 
of quantum logic, and although some researchers have 
proposed that quantum theory might require us to aban- 
don some of the rules of classical logic as rules of right- 
reasoning (see, for example, Ref. [27] ). we will not con- 
sider this possibility here. Indeed, if we incorporate the 
context of a measurement in the propositions, so that we 
distinguish si, finding a gem in box 1 in the context of 
measuring box 1 with box 2, from s^, finding a gem in 
box 1 in the context of measuring box 1 with box 3, then 
the transitivity of implication can be salvaged and there 
is no challenge to classical logic. 

Nonetheless, the failure of the transitivity of implica- 
tion provides another perspective on how to generate 
no-go results for measurement-noncontextual outcome- 
deterministic models. In such models, implications 
among value assignments of observables are necessarily 
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transitive because these value assignments do not depend 
on the context of the measurement. A failure of the tran- 
sitivity of implication therefore implies the impossibility 
of such a model. 

In the case of the double-query n-box OS correlations, 
if n is odd, the perfect anti-correlations justify the fol- 
lowing implications around the ring of boxes, 

Xi = 1 X 2 = =*> X 3 = 1 (29) 
=>• ... =>• x n = 1 => X x = 0. 

By the transitivity of implication, we would conclude 
that X 1 = 1 => Xi = 0. Given that Xi = 1 is 
sometimes observed, one has a contradiction. Conse- 
quently, the observation of the double-query n-box OS 
correlations implies the impossibility of a measurement- 
noncontextual outcome-deterministic ontological model. 

We now demonstrate the existence of a quantum ana- 
logue of this argument in the case of n = 5. Specifically, 
we demonstrate that for the set of observables in Kly- 
achko's proof, specified in Eq. (|2~i)) and depicted in Fig.[TJ 
there is a quantum state such that 

X x = 1 X 2 = => X 3 = 1 (30) 
=> Xi = =S> X 5 = 1 => X x = 0. 

First, note that an inference from X a = 1 to X a ^\ = 
can be made independently of the quantum state, be- 
cause for any pair of orthogonal projectors, at most one 
of them can take the value 1. However, an inference from 
X a = to X a 0i — 1 is only true for certain quantum 
states because a pair of projectors may both be assigned 
the value 0. To ensure that X a = implies X a ®i = 1, 
we must choose a quantum state that lies in the span of 
the vectors \l a ) and |i ®i) in Hilbert space. This way, the 
vector orthogonal to this span is assigned value 0, such 
that if \l a ) is assigned value 0, |Z a ©i) must be assigned 
the value 1. Starting with an assignment of X\ = 1, we 
need to make the X a = to X a ^i = 1 inference twice 
in the pentagram: from Xi = to X3 = 1 and from 
Xi = to X5 = 1. Consequently, we need a quantum 
state that lies in the subspace (plane) spanned by IZ2) 
and IZ3) but also in the subspace spanned by and 
I/5). Fortunately, these subspaces intersect on a ray (see 
Fig. H]), and therefore we take the quantum state to be 
the one associated with that ray, indicated in Fig. [1] as 

IV>2>. 

Therefore, assuming a preparation of the state IV*^), 
we have the sequence of implications of Eq. (f3"Uf . By the 
transitivity of implication, we can conclude that X\ = 
1 =>■ Xi = 0. Given that X\ = 1 is assigned non-zero 
probability by \ifa>), specifically, p = 1 — -J= ~ 0.1056, we 
have derived a contradiction from the assumption of the 
transitivity of implication, and therefore also from the 
assumption of an ontological model that is measurement- 
noncontextual and outcome-deterministic for projective 
measurements (i.e. traditionally-noncontextual) rH 



This is a proof of the Kochen-Specker theorem which is 
analogous to Hardy's proof of Bell's theorem, described 
in Sec. IIVEI Interestingly, it is not possible to generalize 
this type of proof to the case of n > 5 using a set of vec- 
tors that form an {n/ 11 ^-} star polygon. For instance, 
in the case of n = 7, if we start with X\ = 1, in order to 
make the inference from X2 = to X3 = 1, from X4 = 
to X5 = 1 and from Xq = to X7 = 1, the quantum 
state would have to He in each of the following three sub- 
spaces: the one spanned by \h) and ^3), the one spanned 
by IZ4) and |Zs) and the one spanned by \Iq) and IZ7). But 
although any pair of these subspaces intersect along a 
ray, the three do not, so there is no quantum state that 
does the job. 

The state-specific Kochen-Specker proof we have just 
presented turns out to be related to Clifton's 8-ray 
Kochen-Specker proof [2*3]. The latter makes use of 
the famous 8-vertex subgraph of the original 11 7- vertex 
Kochen-Specker proof [l|. Clifton's proof also has an 
interesting connection with the pre and post-selection ef- 
fect known as the "three-box paradox" [28|, as shown 
in Ref. [2^]. A connection between Klyachko's Kochen- 
Specker proof and the 8-ray proof (as well as Hardy's non- 
locality proof) has also been noted previously in Ref. [3(| ■ 

To see how our proof is related to Clifton's, let us de- 
note the vector orthogonal to the span of IZ2) and IZ3) by 
1%) and the one orthogonal to the span of {h} and IZ5) 
by |x') , then the orthogonality relations of the eight vec- 
tors {|/i) , \l 2 ) , |/ 3 ) , , I/5) ! Ix) 1 \x') , \tp2)} are summa- 
rized by the diagram in Fig. [3] (where nodes represent rays 
and the presence of an edge represents orthogonality). 
In an outcome-deterministic mcasurcment-noncontcxtual 
model, every vector must receive a value or 1 with ex- 
actly one member of every orthogonal triple receiving the 
value 1, and no more than one member of an orthogonal 
pair receiving the value 1. 




FIG. 3. Clifton's 8-ray state-specific Kochen-Specker proof. 

Clifton's proof can then be phrased as follows. Given 
a preparation of 1^2), the vector \tp2) (considered as a 



A slightly different way of seeing the contradiction is that tran- 



sitivity of implication specifies that X\ = 1 implies X5 = 1, 
whereas by a joint measurement of X\ and X5, we would infer 
that Xi = 1 implies X5 = 0. 



13 



measurement outcome) must be assigned the value 1 and 
the vector \li) has a nonzero probability of being assigned 
the value 1. We denote the value assigned to vector \<f>) 
by v {\4>))- From v (\ip2)) = 1 we infer v (\x)) = v (|x')) = 
and from v = 1 (which happens with nonzero 

probability) we infer v (\h)) = v (\h)) = 0. One then 
concludes from v (\x)) — and v (I/2)) = that v (IZ3)) = 
1, and from v (\x')) = and v (\l 5 )) = that v = 1. 
However, v (IZ3)) = 1 and v = 1 is a contradiction. 

This is the standard way of deriving a contradiction 
for the eight rays in Clifton's proof, however one could 
equally well use the fact that v (\x)) = v (\x')) = and 
v(\li}) = 1 to justify anticorrelation across every edge 
around the ring , |/ 2 ) , I/3) , I/4) , I/5)}, which is just 

the proof we have presented above. 



IV. NO-GO THEOREMS FOR BELL-LOCAL 
MODELS 

A. A separated pair of single-query 3-box systems 

In this section, we consider another variation on 
Speckcr's parable. The seer has a novel 3-box system 
which allows only a single box to be opened, rather than 
two. To distinguish the two types of three-box systems, 
we call the former a single-query system and the latter a 
double-query system. We also assume that the seer can 
prepare a pair of single-query systems that mimic the be- 
havior of the double-query system as follows: if the same 
box is opened on one system as is opened on the other, 
one obtains the same result (both are always found to be 
full, or both empty); if different boxes are opened, then 
one obtains different results (one is always full and the 
other empty). For the benefit of skeptical suitors, the 
seer allows for the queries of the two different systems 
to be implemented at space-like separation. We imagine 
that they are transported to different corners of the As- 
syrian empire: one to Abydos and the other to Babylon. 
The suitor dispatches two of his trusted classmates, one 
to each of these two cities, and instructs them to choose 
a box at random. 

We are therefore imagining a situation wherein two 
observables are measured jointly by first preparing a pair 
of systems in a perfectly correlated state and measuring 
one observable on each. 

As we will demonstrate below, this version of the 
Specker parable allows us to establish a simple proof of 
nonlocality in the same spirit as that presented by Mer- 
min in Ref. [l(|. Let us denote the choices made by the 
two class-mates by a and b respectively, taking values 
in the set {1,2,3}, corresponding to the choice of box. 
Further, we denote the results of box a at Abydos and 
box b at Babylon, respectively, by A a and B bl taking 
values in {1,0} corresponding to the observations {full, 
empty}. Then we can express the condition that the out- 
comes must satisfy in this two- wing version of the Specker 



parable as 

A a ® B b = 1 ® 6 a ,b, (31) 

where S denotes the Kronecker delta function. To quan- 
tify the extent to which these correlations are realized, let 
us define R3 as the weighted sum, assuming a and b are 
chosen uniformly at random, of the probability of achiev- 
ing perfect negative correlation when a ^= b, and the 
probability of achieving perfect correlation when a — b. 
That is, 

R 3 =^ p(A a ^B b \M a ,M b ;T) 

a,b:b^a 

+1 H p(A a = B b \M a ,M b ;V), (32) 

a,b:b—a 

where p{A a = B b \Ma,M b ;V) refers to the probability 
of finding A a = B b conditioned on box a being opened 
at Abydos and box b being opened at Babylon; likewise 
for p(A a ^B b \M a ,M b ;V). 

The OS correlations described in the two- wing Specker 
parable can be summarized as 

p(A a ,B b \M a ,M b ;V) = l for a = b, 

A a ,B b :B b =A a 

J2 p(A a ,B b \M a ,M b ;P) = l for a ^6, (33) 

A a ,B b :B b ^A a 

The assumption that M. a and M. b are jointly measur- 
able in the sense of definition ([T]) implies that they must 
satisfy a condition of no superluminal signaling (l7l . l3lj , 
namely, 

P (A a \M a ;V) = Y,P(Aa,B b \M a ,M b ;V) Mb, 

B b 

P (B b \M b ;V) = Y,P( A a,B b \M a ,M b ;V) Va (34) 

A a 

which asserts that the conditional marginal probabilities 
p(A a \Ai a ', V) obtained by summing over B b are inde- 
pendent of the choice of the distant measurement pro- 
cedure Ai b , and likewise for p(B b \A4 b ; V). It is simple 
to show, as we do in Appendix [D] that by imposing the 
no-signaling condition, the correlations are constrained 
to be of the following form: 

Va^b:p(0,l\M a ,M b ,V*) = i 

p(l,0\M a ,M b ;V*) = ^, 

Va = b:p(0,0\M a ,M b ,T*) = ^ 

p(l,l\M a ,M b ;V*) = ^. (35) 

We will henceforth call these the nonlocal OS correla- 
tions. The winning probability for Eq. (|3"2")l is unity for 
these correlations, i.e., 

i?^ LOS = 1. (36) 
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They are the only non-signaling correlations that can win 
this prediction game deterministically. This implies, in 
particular, that the nonlocal OS correlations represent 
an extreme point of the convex set of non-signaling cor- 
relations [32I , v ery much like the archetypical PR bo43 
correlations 17J for the scenario where a, b only run from 
1 to 2. Although these correlations do not allow for su- 
pcrluminal signaling, they do violate Bell's assumption 
of local causality [331 ] . as we now demonstrate. 

In order to enforce perfect positive correlations when 
the suitor's two classmates make the same measurement, 
the Babylonian system must be prepared with an an- 
swer for each possible query that matches the answer that 
the Abydosian system is prepared to provide. It follows 
that there are deterministic noncontextual hidden vari- 
ables determining the outcome on the Babylonian sys- 
tem. This step is familiar from Bell's original derivation 
of his theorem Q : locality together with the assumption 
of perfect correlations implies the existence of determin- 
istic noncontextual values for each system. Given such 
values, it is easy to see EH that the overall probability of 
winning the game in a locally causal model is at most | . 
That is, 



i? 3 < R 



local 



(37) 



This is a Bell inequality. The fact that i?~ LOS = 1 for 
the seer's system is a violation of this Bell inequality and 
a proof that no locally causal model of the nonlocal OS 
correlations is possible. 

The Bell inequality (|3"7| can also be written in terms 
of the more conventional correlation function, or the so- 
called two-party correlator (A a Bb), where A a , Bb take 
on values {+1,-1} as usual: 



The two-party correlator is simply the average value of 
the product of the result in Abydos when box a was 
chosen, multiplied by the result in Babylon when box 
b was chosen. Together with the normalization condi- 
tion J2A a .§ b P(Aa,B b \M a , Mb,V) = 1, we can now re- 
express the winning probability as R3 = j^S; 



3 i- # , where 



(39) 



a=£b 



In these notations, it is again easy to verify that if the 
variables A a , Bb admit pre-existing values ±1 (i.e. are de- 
termined by hidden variables), then S3 < 5 [cf. Eq. (|3"T|) ]. 
(As is now well-known [lj| , the same bound also applies 
to any locally causal model where the values of the vari- 
ables are determined by stochastic hidden variable mod- 
els.) Specker's correlations require 1S3 = 9, thus clearly 
violating the Bell-inequality. 



B. Mermin's proof of Bell's theorem 

What about correlations allowed in quantum theory? 
We know from a celebrated theorem by Cleve et al. (The- 
orem 5.12, Ref. [34[) that there is no quantum strategy 
that can give unit winning probability. While it is not 
possible to realize the over-protective seer parable as for- 
mulated above, it is nevertheless possible to demonstrate, 
using quantum mechanics, correlations that approximate 
the desired correlations better than any locally causal 
model can. As it turns out, the largest winning proba- 
bility allowed by quantum theory is (see Appendix [E] for 
details) 



R 



quantum 



(40) 



(A a B b ) = p(A a ,B b \M a ,M b ;V) 

A a ,B b :B b =A a 

p{A a ,Bb\M a ,Mb\V) (38) 

A a ,B b :B b ^A a 



12 The terminology "box" is, in the present circumstances, unfor- 
tunate. It refers to a "black-box" (i.e. unexplained — indeed, 
inexplicable — source of correlations) between two distant par- 
ties, just as in our above scenario. 

13 For instance, if the values assigned to the state of the boxes 
1,2, and 3 are 0,0,1, then we have positive correlation for 
(a, b) £ {(1, 1), (2, 2), (3, 3)} and negative correlation for (a, b) £ 
{(1, 3), (3, 1), (2, 3), (3, 2)}, in accordance with the nonlocal OS 
correlations. However, we also have positive correlation for 
(a, b) £ {(1, 2), (2, 1)} in disagreement with the nonlocal OS cor- 
relations. Thus the correlations are correct in only 7 out of 9 
cases. Alternative strategies that do not enforce the perfect cor- 
relation when a = b cannot do any better. For instance, if all 
boxes on the left are empty and all boxes on the right are full, 
then one has 6 out of 6 anti-correlations but out of 3 correla- 
tions, leaving a winning probability of 2/3, which is smaller than 
7/9. 



and hence l 5^ uantum — g (which exceed the Bell-local 
bounds of | and 5 respectively). That quantum the- 
ory allows such non-trivial correlations can be verified 
by considering the two-qubit maximally entangled state 
^=(|0) |0) + |1) |1)) (in the a z basis) and letting A 1 , A 2 , 

and A3 be the results of measuring the three Pauli oper- 
ators equally spaced in the z-x plane, defined by 

27r(a-l)„ 27r(a-l)„ 
,4 a = cos — -a z +sm — y — -a x ; (41) 

likewise for the Bb, which are defined identically. Thus 
quantum mechanics allows us to move towards the ex- 
tremal non-local correlations in our formulation of the 
parable. This proof that quantum theory violates Bell- 
locality (Bell's theorem) is in fact the one popularized by 
David Mermin [Tf3l. 



1. Generalization to all odd n 

It is straightforward to generalize this new parable to 
the case of n boxes for all odd n > 5. Specifically, posit 
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a separated pair of n-box rings such that if the box that 
is opened in Abydos is the same as the one opened in 
Babylon, the outcomes agree, while if the index a of the 
box opened in Abydos differs by 1 from the index b of 
the one opened in Babylon, that is, if b = a © 1 or a = 
& © 1, then the outcomes disagree. As it turns out, the 
correlations must be of the fornj^l 



& = a © 1 or a = b( 



a:p(l,0\M a ,M b ;V) 
p(0,l\M a ,M b ;P) 

a:p(0,0\M a ,M b ;V) 
p(l,l\M a ,M b ;V) 



1 
2 
1 

2 
1 

2 



We do not specify the nature of the correlation for other 
values of a and b. 

We can define the average probability of success as 



p(Aa^B b \M a ,M b ;V) 



a,fc:fc=a©l or a=6©l 



~ p(A a = B b \M a ,M b ;V). 

a.b:b—a 



(43) 



It is evident that with a local strategy if one has perfect 
correlation when a = b, then when a = £>ffilor6 = affi 
1, one can have perfect anti-correlation with probability 
at most (n — Furthermore, no local strategy can 

do any better than this. Consequently, given that the 
conditions a = b, a = b ® 1, and b = a © 1 arise with 
probability 1/3 each, the winning probability with a local 
strategy is upper bounded bvl 15 l 



Rn < R 



local 



2n 
3 _ 



1 



1 



2 

3n 



(44) 



Quantum theory can violate this inequality. Using the 
same entangled state as above, we generalize Eq. (|4*Tj) to 



A a = cos (p a a z + sin Lp a a x 



(45) 



where ip a = ^^-^(a — 1), and likewise for the B b (Fig. [4}. 
There are 3n kinds of measurement statistics that appear 
in R n . We consider each in turn. For the n terms wherein 
a = b, we obtain perfect correlation with probability 1, 
while for the n terms wherein a ~ b(B 1 and the n terms 
wherein b = a © 1, we obtain anti-corrclatcd outcomes 



The proof of this proceeds analogously to the one given in Ap- 
pendix [D] for the specific case of n = 3. 

It is worth noting that none of the following Bell inequalities is 
facet- inducing (following the terminology of Ref. |35|Q . or tight 
(following the terminology of Ref. [36l. l37|). That is, they do not 
correspond to the boundary of the set of locally causal correla- 
tions with maximal dimension. 



with probability cos 2 (7r/2n). In all then, we find the 
corresponding probability of success as: 



^quantum 



1 



- cos — 

3 2n 



1 



6n' 2 



(46) 



Once again, for a large number of suitors, the seer can 
choose n, the number of measurement settings, to ensure 
that with very high probability all of the suitors will lose, 
despite their classically founded expectation that one of 
their number is very likely to win. 





FIG. 4. Representation of the observables used in the 
Mermin-type proofs of nonlocality for n=3, 5 and 7. The ob- 
servables are depicted by lines in a plane of the Bloch sphere. 
The vertices of each line correspond to the eigenvectors of the 
associated observable, with the labeled vertex associated with 
eigenvalue 1. 



C. Connection to previous work 

An analogous game is discussed by Vaidman (38j who 
considers a slightly different narrative device: a neck- 
lace having an even number n of beads each of which 
can be one of two colors and such that one finds all 
adjacent beads to be of different colors except for the 
first and last beads which are of the same color. It is 
clear that by replacing the first and last beads by a sin- 
gle bead, we have precisely the correlations considered 
above. Another variation of the game was considered by 
Braunstein and Caves [3!|: there, perfect correlation is 
required for all adjacent pairs of measurements except 
that between the first and the last, in which case perfect 
anti-correlation is required. This game gives rise to the 
so-called "chained Bell inequalities" [39| . 
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The problem of maximizing the winning probability R n 
is also relevant to the strength of a two-prover interactive 
proof system of the type described by Cleve et al. [341 ]. 
The two provers are taken to be two agents of the seer 
(one sent to Abydos and the other to Babylon), while 
the suitor is the verifier. The provers' task is to convince 
the verifier that a cyclic graph with an odd number n 
of vertices is 2-colorablc (despite the fact that it is not). 
The verifier sends the name of a vertex to each prover 
such that the two vertices are either the same or adjacent. 
The provers, who cannot communicate with one another, 
must each respond with a color. The existence of systems 
generating the nonlocal OS correlations would provide 
the provers with a perfect winning strategy. 

Cleve et al. have analyzed a two-player interactive 
proof, called the odd cycle game, which is very similar to 
the one we consider here. The odd cycle game is another 
natural generalization of Spcckcr's parable to a pair of 
systems where for a given measurement on the Abydosian 
system, there are two rather than three options for the 
measurement on the Babylonian system: it is the same, 
i.e. b = a, or it has index one higher, i.e. b = a © 1. The 
possibility of a = b © 1, which is allowed in the game we 
have considered, and ensures symmetry between the two 
players, is excluded in the odd cycle game. El 



D. From OS correlations to PR-box correlations 

Another way of generalizing the single-query 3-box OS 
correlations to a separated pair of parties is to imagine 
that each party has a 3-box system, but the first party 
only ever opens the first or second box, while the sec- 
ond party only ever opens the second or third box. If we 
imagine that there is correlation when they both open 
the second box and anti-correlation otherwise, then this 
set of measurements is already sufficient to obtain a con- 
tradiction with a local model, Specifically, the local de- 
terministic values must satisfy 



A 2 B 2 = 


fl, 


(47) 


A ± B 2 = 


-1, 




A 1 B 3 = 


-1, 




A 2 B 3 = 


-1, 





The upper bound on the winning probability with a local strategy 
for the odd cycle game is clearly R l ° cal < i + \ ,J -^- = 1 — 
=— . The maximal quantum violation, which is determined in 
Ref. [34ll , is achieved if the measurements on Alice's system are 
the spin operators A a in Eq. l|45|l . while the measurements on 
Bob's system are a rotation by an angle of tt /An of the spin 
operators Bi, in Eq. 1450 . In this case, for the n terms wherein 
a = b, we have correlation with probability cos 2 (7r/4n), and for 
the n terms wherein b = a © 1, we have anti-correlation with 
probability cos 2 (7r/4n), such that /j^ uantum = cos 2 (7r/4n) ~ 1 — 

16n 2 ' 



but the product of the left-hand-sides is AlA^BfB^ — 
+1, while the product of the right-hand-sides is —1. The 
correlations of Eq. (|47j) are precisely the PR box corre- 
lations fl7j that have been extensively studied in recent 
years. 



E. Hardy-type no-go theorems for Bell-local 
models 



In outcome-deterministic ontological models that are 
local or noncontextual, implications among value assign- 
ments of observables are transitive because these value 
assignments do not depend on the context (local or re- 
mote) of the measurement. The failure of the transitivity 
of implication therefore implies the impossibility of such 
models. Again, we find that this conclusion has been 
reached before in the literature on nonlocality. Specif- 



ically, the Hardy-type proof of nonlocality [llj can be 
expressed in this fashion (4(j, a fact that was first noted 
by Stapp [4l| (for a simplified account, see Refs. {42I I43I] ) . 

We begin by presenting Hardy's proof of nonlocality in 
its standard form. It uses a pair of binary-outcome ob- 
servables on each wing of the experiment. Hardy demon- 
strated a way of choosing these observables such that for 
any partially entangled pure state, the correlations be- 
tween these observables satisfy: 



Ai = l 
B 2 = 1 



B l = 1, 
A 2 = l, 



while 



sometimes (A\ = 1 and B 2 = 1) 



(48) 
(49) 



(50) 



(i.e. with probability pHardy = p{A\ = 1 and B 2 = 1) > 
0), and 



never (A 2 = 1 and B\ = 1) 



(51) 



We can express this as a failure of the transitivity of 
implication as follows. From Eqs. (gSJl, ([51]) and (05]) (in 
its contrapositive form), we infer respectively, 



A l = l 

By = 1 

A 2 = 



Bi = 1, 
A 2 = 0, 
B 2 = 0. 



(52) 
(53) 
(54) 



which we summarize graphically by 
A 1 = 1=>B 1 = 



1 



A 2 = 



B 2 = 



If transitivity held, then these three inferences would im- 
ply that 



Ai = l 



B 2 = 0. 



(55) 
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However, this contradicts Eq. (|50|) and consequently 
transitivity must fail. More explicitly, taking to 
be material implication, the negation of Eq. (|55j) is the 
conjunction of A\ = 1 and B% = 1, 



(A 1 = l 



B 2 = 0) = (A 1 = 1 and S 2 = 1) , (56) 



imply the following chain of implications 
4i = 1 => Bi = 1 

A, = =>• S 2 = 



so that the probability pHardy = p(Ai = 1 and B 2 = 1) 
quantifies the frequency with which the transitivity of 
implication fails. 

We now consider the status of this sort of proof for the 
PR box. By relabeling the outcomes of the standard PR 
box, one can obtain correlations of the form 

Ai = Si (57) 

A\ = B 2 (58) 

A 2 = Si © 1 (59) 

A 2 = S 2 , (60) 

with marginals of the form p(A\ = 0) = p (A 2 = 0) = 
p(Si = 0) = p (S 2 = 0) = 1/2. Eqs. {57]), {5J| and §U\) 
imply the inferences of Eqs. (|52")l , ([55]). and (|54"]) respec- 
tively. Meanwhile, Eq. ([55|. together with the fact that 
p(Ai = 1) = 1/2, implies that sometimes Ai = 1 and 
S 2 = 1, or equivalently, that sometimes Eq. (|55|) fails, so 
that we have a contradiction with transitivity. Indeed, 
the probability of this occurring is pHardy = p{A\ = 1 
and S 2 = 1) = 1/2. 

Actually, pHardy only quantifies the probability for one 
particular kind of contradiction, which requires Ai = 1 to 
get going. In the rest of the cases, where A\ = 0, we still 
obtain a contradiction because Eqs. ([57]) . (|59"j) and (|6"0")) 
also imply inferences of the form of Eqs. (|52|) . (|53| . and 
{53} where A a A Q © 1 and S 6 S b © 1. Transitivity 
then implies that A\ = =^> S 2 = 1, while Eq. ((55)) 
contradicts this. So one obtains a contradiction with 
certainty for the PR box. 

There is another aspect of these PR box implications 
that cannot be emulated by quantum theory which has 
recently been pointed out by Fritz 44|: if one supple- 
ments the implications in Eqs. (f52" j) - (|54"]) with the im- 
plication B 2 = 1 ==> Ai = 1 or any of the two re- 
verse implications, that is, A 2 = =>■ Si = 1, or 
S 2 = ==> A 2 = 0, then the resulting set of con- 
straints cannot be satisfied by any quantum state and 
set of projective measurements. 

As discussed in the introduction, and rehearsed in 
Sec. IIII C[ Specker introduced his parable of the over- 
protective seer in order to demonstrate the possibility of 
a logic wherein there is a failure of the transitivity of im- 
plication. One therefore expects that the nonlocal OS 
correlations from Sec. lIV Al which are based on Specker's 
parable, ought to provide a proof of nonlocality via such 
a failure of transitivity. This is indeed the case, as we 
now show. The nonlocal OS correlations, cf. Eq. (piTl) . 



-4:, 



1 



S, 



1 



If the transitivity of implication held, we would have 



Ai = l 



S s 



1. 



(61) 



However, Eq. (pH]) together with the fact that p{A\ = 
1) = 1/2, cf. Eq. (|35p . implies that sometimes A\ = 1 
and S3 = 0, which contradicts Eq. (j6"T|) . Indeed, 
we achieve this contradiction with probability pHardy = 
p (A x = and S 3 = 1) = 1/2. As with the PR box, one 
can obtain a contradiction with certainty also in the cases 
where A\ = 0. 

Although the nonlocal OS correlations cannot be 
achieved in quantum theory, it is interesting to ask 
whether the particular contradiction constructed above 
might be achieved with some nonzero probability for 
some choice of state and observablcs. Indeed, this 
is possible. In particular, this can be achieved with 
p Ha r d y = \u/(27 + V^) 2 ~ 0.17443 by using the quan- 
tum state \ip) = (1 + ?? 2 r 1/2 (|0) |0) - r]\l) |1)) and the 
projectors defined by: 



A a = 



^(A)\/+(A) 



B h 



b 



b 



where 



-7=fM0> + |l» 

1 ^(\0) - K 2 \l)) 



f (B) 
b 



i_ 



? (|0>+« 3 |1»: 

"3 

r(-K 3 |0) + |i» 

\n 2 \0) + \l)): 
(|0)-/si|l)):6 = 3, 



a = 1, 
a = 2, 
a = 3, 

:b=l, 
6 = 2, 



K a = ^(«+lmod3) + i j and ^ = ^ 

The above Hardy-type proof of nonlocality via the fail- 
ure of the transitivity of implications is entirely equiva- 
lent to the proof of nonlocality due to Boschi et al. [4(| . 
Note that a slightly stronger contradiction with p Hardy w 
0.17455 can be obtained with a different choice of 77 [ioj . 
Moreover, this latter value of p Hardy is only marginally 
different from the quantum-mechanical upper bound 
p Hard y < 0.17456 obtained from the tools of Ref. [H]. 
This suggests that the strongest contradiction in this sce- 
nario may already be achievable using a two-qubit par- 
tially entangled pure stateFI 



17 See, however, Ref. |45| for some strong evidence that in some 
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It is also worth noting that by considering a similar 
setup that involves an increasing number of boxes, and 
hence a longer chain of intransitive implications, quan- 
tum theory actually provides a contradiction with in- 
creasing p Hardr that asymptotes to 50% ■ 

We end this section with a demonstration that there is 
a particular kind of failure of transitivity that one does 
not find in quantum theory. We begin by noting that 
with a PR box, we can get a contradiction with the tran- 
sitivity of implication in a manner which is different from 
that of Hardy's proof, and in some ways more striking. 
In addition to deriving Eqs. (|52p . ([55)1 and ([5T| from 
Eqs. (|57]), (|59]) and ([60]), we can derive 

B 2 = =^> A 1 = 0. (62) 

Graphically, the chain of inferences is 

Ai = 1 =s> B x = 1 

A 2 = =*► B 2 = 

A x = 

Were transitivity of implication to hold, we would con- 
clude that A\ — \ =^> A\ = 0, which, together with the 
fact that p(Ai = 1) = 1/2, yields a contradiction. This 
sort of proof is also available for the nonlocal OS correla- 
tions. It can be characterized as providing a sequence of 
inferences about values of observablcs wherein the conse- 
quent of the last inference contradicts the antecedent of 
the first inference. The question is whether this sort of 
contradiction can be achieved in quantum theory. As it 
turns out, it cannot, as we now demonstrate. 

Consider an arbitrary pure bi-partite state \^f) € 'H A ® 
H B where H A and U B are Hilbert spaces of dimension 
d. Defining {|fc)}^ =1 to be an orthonormal basis of H, 
p to be a density operator, 1 to be the identity operator 
and U to be a unitary operator, we can always write |\I / ) 
in the form 

|*) = (i ® Uy/p) \ k ) ® \ k ) ■ ( 63 ) 
k 

Now suppose that one measures system A with the 
POVM {|0)(0|,i- \4>){(j)\} and one obtains the \4>){<j)\ 
outcome. This leads to an updating of the description of 
the state of system B to 

\x)=M x uVp\4> m ) (64) 



cases, it may require infinite-dimensional Hilbert space to achieve 
the strongest correlations allowed by quantum mechanics even 
though the two-qubit correlations are only marginally different 
from the quantum mechanical upper bound derived from the 
tools of Ref. |H. 



where 

|,f (65) 

k 

and Af x is a normalization factor. Consequently, a 
subsequent measurement on system B of the POVM 
{lx)(xl,i-|x)(x|} w iU yield the \x)(x\ outcome with 
certainty. 

Next, consider the experiment wherein 
{|</>)(</>| , 1 — |0) (</>|} is not made on A, but the 

measurement {|x)(xl > 1 — {lx)(xl} i s ma( lc on B and 
the outcome |x)(xl i s obtained. One then updates the 
description of the state of system A to 

|0')«(X| (i®U^]5)J2\k)®\k) 

k 

cx<0*|(i®p)^|fc)®|fc) 

k 

«(ri(p T ®i)^|fc)®|fc) 

k 

|0'HA/- ,p T |0), ( 66) 

where J\f^> is a normalization factor. A subse- 
quent measurement on system A of the POVM 
{|0')(0'|,i- W){<t>' 1} will then yield the \4>'){4>'\ out- 
come with certainty. 

The state |x) on B is called the relative state to 
on A given \^>) on AB [HI]. Similarly, \cj>') on A is the 
relative state to |x) on B given |\&) on AB. If we find 
a particular state on one system, then we are certain to 
find the relative state on the other should we measure 
for it. Consequently, we can consider an arbitrary chain 
of such pairs of measurements, and at every step in the 
chain we can make a perfect inference from the positive 
outcome of one to the positive outcome of the other. 

We pause at this point in the proof to note that this 
analysis provides a particularly simple way of under- 
standing Hardy's proof of nonlocality. Using reason- 
ing analogous to that above, the relative state to \<fi') is 
|x') where |x') = UpW |x) (note that there clearly exist 
choices of p and U such that |x') ^ |x))- If the transitivity 
of implication held, then by this sequence of perfect infer- 
ences, we would conclude that whenever \4>){4>\ is found 
on A, it would be the case that |x')(x'l is necessarily 
found on B. However, this conclusion is false because 
the relative state to is |x) , so that the probability of 
finding |x')(x'l on S is | (x'lx) | 2 which is less than one 
if |x') ^ |x) ■ Thus transitivity must fail. 

We now show that a quantum proof of nonlocality 
cannot be constructed in terms of a sequence of infer- 
ences wherein the consequent of the last inference con- 
tradicts the antecedent of the first. We define a set of 
N observables on A, each of which is projective, namely, 
{ |0W)(0W | | i = 1, . . . , N} and a set of N similar ob- 
servables on B, { |x^)(x^ | | * = 1) • • • ) N}. Here, 
is the relative state to \4>^) and <p l+1 )) is the relative 
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state to \x^) ■ This implies that we can infer from find- 
ing 1 W ^ ( ^>M | on A to the necessity of finding | X^ ) ( X^' | 
on B, and from finding |x )(x | 011 B to the necessity 
of finding |</>( 1+1 )^</)( t+1 ) | on A. If transitivity of impli- 
cation held, then we could chain these inferences together 
such that from finding 0^)(<p 1 ' on A, we would infer 
the necessity of finding on A. 

The question is whether we can ever have such a chain 
where \4>^ N ^) is orthogonal to \ (/)^) ■ Note from the anal- 
ysis above that oc p T |<^"') , so that 



(67) 



Therefore the condition for orthogonality is 

\(^\(p T ) N \<j>W)f = 0. (68) 

Given the non-negativity of p (and hence of p T and 

(p T ) N ), this condition is only satisfied if p T \4>^) = 0, 
but this would imply that the probability of finding 
J on the bipartite state \&) vanishes. In other 
words, the only bipartite state for which we can have a 
chain of inference wherein the final consequent denies the 
initial antecedent is one that denies the initial antecedent. 
Therefore, such a contradiction cannot be achieved. This 
is in contrast to what occurs in the case of PR boxes and 
the nonlocal OS correlations, and is therefore a feature 
which distinguishes quantum theory from these foil the- 
ories. It is interesting to note, however, that in quantum 
proofs of contcxtuality one can find a chain of inferences 
where the final consequent denies the initial antecedent 
and the initial antecedent is sometimes true, as shown in 

Sec. Unci 



pentagonal network represents the extremal version of 
the correlations in Klyachko's no-go theorem; the hexag- 
onal network represents the kind of correlations described 
by Vaidman in Ref. (38j . 




FIG. 5. Frustrated networks representing the extremal corre- 
lations in various proofs of contextuality. 

Fig. [5] provides network representations of the extremal 
correlations that were used in proofs of nonlocality. We 
have labeled the nodes to highlight the spatial region 
in which each of the outcomes occurs. The network on 
the left, which is graph-isomorphic to the square net- 
work above, represents the correlations generated by a 
PR-box [17J . The network on the right depicts the corre- 
lations found in the separated pair of single-query 3-box 
systems of Sec. IIVI 
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FIG. 6. Frustrated networks representing the extremal corre- 
lations in various proofs of nonlocality. 



V. FRUSTRATED NETWORKS 

It is instructive to consider a network representation of 
the various correlations that we have considered thus far. 
The bit associated with the outcome of a binary-outcome 
measurement (this is the only type of measurement we've 
considered) is associated with a node. Perfect positive 
correlation between outcomes of distinct measurements 
is represented by a solid line between the nodes, perfect 
negative correlation by a dashed line. Such representa- 
tions of correlations have been discussed before in the 
context of nonlocality proofs, in particular by Mitchell, 
Popescu and Roberts [47J and in the Ph.D. thesis of 
Collins HH and by Schmidt H|. 

Fig. [5] provides network representations of the ex- 
tremal correlations that were used in the no-go theorems 
for mcasurcment-noncontcxtual outcome-deterministic 
models. The triangular network represents the OS cor- 
relations in Specker's parable; the square network repre- 
sents the PR-box correlations (understood as a proof of 
contcxtuality, i.e. where the four measurements are con- 
sidered to be implemented in one spatial location); the 



Let the bit describing whether there is an even or an 
odd number of dashed lines along a path be called the 
parity of the path. We shall say that a network is frus- 
trated if for any pair of nodes, there exist paths with 
different parities connecting those nodes. Clearly, each 
of the networks in Fig. [5] is frustrated. It is this frus- 
tration which captures the impossibility of an outcome- 
deterministic measurement-noncontextual model of these 
correlations. For the networks given in Fig. [6j this im- 
possibility also gives rise to a simple proof of nonlocality 
of the depicted correlations. 

For any network, we can determine whether or not it is 
frustrated by looking only at its cycles. This is because 
frustration occurs when there are two paths with differ- 
ing parities and this fact will reveal itself by examining 
the cycle consisting of that pair of paths. Thus, to see 
the ways in which a network can be frustrated, it suffices 
to consider the ways in which cycles can be frustrated. 
For any integer number of nodes, it is straightforward to 
find all the frustrated cycles with that number of nodes. 
For two nodes, there is only a single path and therefore 
no possibility for frustration. At 3 nodes, the frustrated 
networks are those indicated in Fig. [7J The case of two 
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correlations and one anti-correlation corresponds, in the 
imagery of Specker's parable, to a case where if boxes 1 
and 2 or boxes 1 and 3 are opened, one finds the same 
outcome, but if boxes 2 and 3 are opened, the outcomes 
always differ. Note, however, that these different net- 
works are equivalent up to a relabeling of the outcomes 
and consequently represent essentially the same correla- 
tions. 



(«) 



(6) 




FIG. 7. All the ways in which a triangular network can be 
frustrated. 



Indeed, all the frustrated networks with a given num- 
ber of nodes can be obtained one from another by a rela- 
beling of the outcomes. It therefore suffices to consider a 
single representative of the equivalence class of frustrated 
networks with a given number of nodes. 

It is also possible to have a similar graphical represen- 
tation for some of the no-go theorems for noncontextual- 
ity and locality that are based on a failure of transitivity 
of implication. We represent a set of implications among 
the values of binary-outcome observables by a directed 
graph with decorated edges. The implications of interest 
are of the form: X\ = x X2 = y where x, y G {0, I } 
and either y = x or y = x®l. We depict this by inserting 
a directed edge (i.e. an arrow) from the node for X\ to 
the node for X2 and decorating the base of the arrow with 
the value x; the directed edge is solid if y = x and dashed 
if y = x©l. Note that this implication can also be written 
in its contrapositive form as X2 = y® 1 ==>■ X\ = x® 1. 
Therefore, we can always represent the same implication 
with an arrow in the opposite direction. When reversing 
an arrow, the value decorating the arrow stays the same 
if the arrow is solid and flips if the arrow is dashed. 

If the parity is odd around a closed loop in such a 
directed graph, then the antecedent of the first implica- 
tion is denied by the consequent of the last implication. 
Therefore, as long as the antecedent has non-zero proba- 
bility, we have a failure of the transitivity of implication. 
Such a directed network is said to be frustrated. 

In the introduction, we described how Specker's para- 
ble implies a failure of the transitivity of implication (un- 
der the assumption that value-assignments to observables 
are context-independent). Letting s, denote the propo- 
sition that Xi — 1 (box i contains a gem), the set of 
implications arc: s± =>■ ~^S2, -<S2 ==> S3, and 
S3 => -1S1. These are represented by the directed net- 
work of Fig. Uta), which is clearly frustrated. The set of 
implications that are used in the transitivity-based no-go 
theorem of Sec. IIII CI are represented by the pentagonal 
version of this directed network, Fig. Eth), which is also 
frustrated. 

Unlike the undirected frustrated networks, which are 
composed of a set of correlations some or all of which 
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FIG. 8. Some frustrated directed networks corresponding to 
contextual correlations. 



are only approximated by the quantum correlations, the 
directed frustrated network of Fig. [5th) is an exact spec- 
ification of implications one finds in quantum theory, 
specifically, those described in the proof of Sec. IIII CI 
The only sense in which one could imagine a theory be- 
ing "more contextual" , according to this sort of proof, 
is by assigning a higher probability to the contradiction- 
generating valuation of the first observable in the chain. 
An extremal version of such a proof would be one wherein 
both possible valuations of the first observable yielded a 
contradiction. We conjecture that such a proof cannot 
be found in quantum theory. 

Finally, the "striking" form of the PR box correlations, 
presented in Sec. I IV El and associated with the set of 
implications below Eq. (|62[) . is represented by the frus- 
trated directed network in Fig. (Hla) , and the generaliza- 
tion of this to the case of the nonlocal OS correlations 
is represented in Fig. [9th). As was shown at the end 
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FIG. 9. Some frustrated directed networks corresponding to 
Bell-nonlocal correlations. 



of Sec. IIVEI it is not possible to find a quantum state 
and a set of observables that instantiates such a set of 
implications while assigning a nonzero probability to the 
contradiction-generating valuation of the first observable. 



VI. NO-GO THEOREMS FOR 
PREPARATION-NONCONTEXTUAL MODELS 

So far, in all of our quantum analogues of Specker's 
parable, the correlations examined were between the out- 
comes of pairs of measurements that could be imple- 
mented jointly. In this section, we consider the possibil- 
ity of achieving these correlations between the outcomes 
of pairs of measurements that are implemented consecu- 



21 



It is important to recognize that one need not rule 
out the possibility of consecutive measurements to en- 
sure the impossibility of joint measurements. The orig- 
inal version of the Specker parable is misleading in this 
respect. It asks us to imagine that after opening two 
boxes, one is simply unable to open the third (as if its 
lid were glued shut with an unbreakable seal). The lit- 
eral generalization to arbitrary measurements Mi, M2 
and M3 that can be measured jointly pairwise but not 
triplewise would seem to be that if Mi and M2 have been 
implemented, a mysterious force prevents us from carry- 
ing out the instructions that correspond to implementing 
M3. However, this conclusion does not follow from a de- 
nial of joint measurability as it is defined in Sec. Ill Al 
One can always implement M3 following a measurement 
of Mi and M2 on a preparation V . It is just that the 
statistics of outcomes of M3 that one thereby obtains 
is not the same as one would have obtained if M3 were 
implemented on V directly. To be precise, if the joint 
statistics of outcomes of a pair of measurements M and 
M' are independent of the order in which they are imple- 
mented, then the consecutive implementation of the two 
measurements constitutes a joint measurement of M and 
M' by the definition of Sec. Ill Al Consequently, a denial 
of joint measurability implies a denial of the invariance of 
statistics under a reordering of the measurements. This 
way of interpreting a lack of joint measurability is pre- 
cisely the one that is familiar from the quantum theory 
of projective measurements. 

To see how the OS correlations might obtain for con- 
secutive measurements, we present a new parable. We 
consider a single-query 3-box system, that is, one where 
only a single box can be opened at a time. A pair of 
boxes can be opened consecutively, but the second box- 
opening need not reproduce the statistics of outcomes 
that would have been observed had it been opened first. 
In this sense, the measurements associated with opening 
distinct boxes cannot be implemented jointly. 

We now get to the specifics of the correlations, which 
are inspired by the original Specker parable. We assume 
that there is a special preparation V* of the 3-box system, 
such that if the same box is opened at the two times, 
then the same outcome is found, while if different boxes 
are opened at the two times, then different outcomes are 
found. 

So far, there is nothing in this set of correlations 
that prohibits their being explained by a generalized- 
noncontextual ontological model. Because no two mea- 
surements are ever implemented jointly in this parable, 
there is no sense in which any measurement has a non- 
trivial context upon which its ontological representation 



might depend. Indeed, there are ontological models that 
explain the correlations easily. They need only posit 
that the first measurement disturbs the ontic state of 
the three-box system in order to enforce the appropri- 
ate correlations. For instance, suppose that three bits 
specify the gem occupation numbers of the three boxes 
and completely characterize the ontic state. It could be 
that finding a (1) for a box forces the other two boxes 
to have occupation number 1 (0). (Indeed, if the suitor 
is opening boxes on a table, this kind of disturbance to 
the ontic state might be enforced by having a hidden 
mechanism under the table that automatically inserts or 
removes gems from the two boxes that were not opened.) 

To obtain a set of correlations that can challenge the 
assumption of gcncralizcd-noncontcxtuality, we need to 
modify the thought experiment slightly by adding the 
following assumption: in addition to the correlations 
described, it is the case that after the early measurement 
is complete, for every possible subsequent measurement 
(the theory may well allow more than the three measure- 
ments that are used in the protocol), it is impossible to 
obtain any information about the identity of the early 
measurement. We call this the trit- obliviousness condi- 
tion (this terminology has its precedent in Ref. (50l|). 

Note that implementing the early measurement pro- 
cedure and selecting a particular outcome constitutes a 
preparation. For each of the three possible measurement 
procedures, Mi, M2 and M3, and each of the outcomes 
and 1, we obtain a distinct preparation procedure. We 
denote these by i\ , P\,i, ^2,0, ^2,1, ^3,0 and P 3< i in 
an obvious notation. We can also define the prepa- 
rations that result when one chooses not to condition on 
the outcome of the measurement procedure. We denote 
these by Pi , P2 and P 3 . Finally, we denote the prob- 
ability of obtaining outcome when the first measure- 
ment M t is implemented on the special preparation T 3 * 
by w t .Q = p(0\Mt; V*), and we define w t .i = 1 — uit,o- 
The statistics for the unconditional preparations are then 
given by 

p(X\M; P t ) = w tfi p (X\M; P t . ) + w tA p {X\M; P tA ) . 

(69) 

The trit-obliviousness condition states that preparation 
procedures Pi,P2 and P3 are operationally equivalent, 
that is, 

VM : p{X\M; P x ) = p(X\M; P 2 ) = p(X\M; P 3 ). (70) 

We now see how the new parable might have trou- 
bles with generalized-noncontcxtuality. In Sec. Ill Dt we 
defined a preparation-noncontcxtual ontological model 



Because implementing the first measurement and selecting a 
particular outcome constitutes a preparation, one can equally 
well describe this section as a consideration of the possibility of 
achieving analogues of the OS correlations between preparations 
and measurements. This is discussed further below. 



Note that measurement procedures Mi and M[ that are in the 
same operational equivalence class Mi, may nonetheless define 
preparation procedures that fail to be operationally equivalent, 
because operational equivalence for preparation procedures is de- 
cided by the statistics of all possible subsequent measurements. 
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to be one wherein operational equivalence of prepara- 
tion procedures implies that they are represented by the 
same distributions in the ontological model, c/. Eq. (fT7|) . 
Thus, from Eq. (|70p and preparation noncontextuality, 
we infer that 

p(X\P 1 )=p(X\P 2 )=p(X\P 3 ). (71) 

Given that convex combinations of preparation proce- 
dures are represented in an ontological model by con- 
vex combinations of the associated distributions (see 
Ref. [13]), we infer from Eq. that 

wi fi p (A|Pi )0 ) + wi.ip (A|Pi,i) 
= W2flP (A|P 2 ,o) + w 2 ,iP (A|P 2 ,i) 
= w 3fiP (A|P 3>0 ) + w 3 ,ip (A|F 3 ,i) ■ (72) 

The "preparation context" is the specification of which of 
the three mixtures of preparation procedures was imple- 
mented, and the assumption of preparation noncontex- 
tuality is that the distribution over A does not depend 
on this context. 

Equation (|72j) is a nontrivial constraint which is not 
necessarily consistent with the posited correlations be- 
tween the preparation procedures Pi.o, Pi,i, P2.O1 Pm, 
P^o and P3.1 and the outcomes of the subsequent mea- 
surements of Mi, M 2 or M 3 . Indeed, in the ontological 
model we proposed above, where the ontic state is a triple 
of bits specifying the occupation numbers of each box, the 
distributions corresponding to the six preparation proce- 
dures are: 



p(A|Pi,o) 


= ^A, (0,1,1) 


(73) 


P(A|P M ) 


= (1,0,0) 


(74) 


P(A|P 2 , ) 


= <^A, (1,0,1) 


(75) 


p(A|P 2 ,i) 


= <5a, (0,1,0) 


(76) 


p(A|P 3 ,o) 


= ^A, (1,1,0) 


(77) 


p(A|P 3 ,i) 


= <^A, (0,0,1); 


(78) 



where 8 denotes the Kronecker delta function. Supposing 
that w\fi = u>2,o = ^3,0 = 1/2, w e have 

p (A|Pi) = -5a,(o,i,i) + t^a,(i,o,o), (79) 

P (A|Pa) = 5*A,(i,o,i) + t^a,(o,i,o), (80) 

P(M P 3) = ^A,(1,1,0) + ^A,(0,0,1), (81) 

and therefore 

p(A|P 1 )^p(A|P 2 )^p(A|P 3 ). (82) 

So we find that the distributions representing Pi , P 2 and 
P3 in the ontological model arc distinct even though these 
preparation procedures are operationally equivalent^] — 



Remember that in the ontological model we arc considering, 



a violation of the assumption of preparation noncontex- 
tuality. 

We have demonstrated that the simple ontological 
model suggested earlier to explain the two-time OS 
correlations cannot also satisfy the condition of trit- 
obliviousness while preserving preparation noncontex- 
tuality. In the next subsection, we will show that 
no ontological model that can explain the OS correla- 
tions and the trit-oblivious condition can be prcparation- 
noncontextual. In this sense, a suitor who is committed 
to generalized noncontextuality should be surprised if he 
sees the specified two-time correlations after having con- 
firmed the trit-obliviousness condition. 

It is useful to summarize the correlations that we have 
described above. 



A. Diachronic pair of single-query 3-box OS 
correlations 

There are six possible preparation procedures, denoted 
P tib where t € {1,2,3} (t for trit) and b G {0,1}, 
and three possible measurement procedures, denoted M y 
where y £ {1,2,3}. For simplicity, we assume that the 
prior over each of t, b and y to be uniform. The outcome 
X of the measurement procedure M y given a preparation 
procedure Ptj, is the following function of t, b and y, 

blVlTtZy' ^ 

that is, the correlations are such that 

p(X = c y (t,b)\M y ;P t , b ) = l. (84) 

Finally, defining the effective preparation procedure Pt 
as the mixture of P^o and Pt,i, it is assumed that no 
measurement can reveal any information about which of 
Pi) P 2 or P3 was implemented, 

VM : p(X\M; P x ) = p(X\M; P 2 ) = p(X\M; P 3 ). (85) 

This is the trit-obliviousness condition. 

Defining the average probability of success as 

R * = ^ Y,P( X = Cy(t,b)\M v ;P t , b ), (86) 
t,b,v 

we can also characterize the two-time OS correlations as 
those achieving R 3 = 1. Using the trit-obliviousness con- 
dition, we shall see that the assumption of preparation 



the measurement M y simply reveals the value of the yth bit, 
that is, p(X = \y\M y ; (Ai,A 2 , A3)) = 1. It follows that J2x 
p(X = 0\M y ;\)p(\\P t ) = § for all y,t e {1,2,3}, and con- 
sequently, the two outcomes of M y occur with equal probabil- 
ity given a preparation procedure P t . Therefore, the ontological 
model captures the fact that Pi, P2 and P3 are operationally 
indistinguishable. 



23 



noncontcxtuality places a bound on the average proba- 
bility of success, namely, 

Rs < i?^ = I- (87) 

We refer to this bound as a noncontextuality inequality. 

The proof is as follows. For any measurement M, the 
probability of outcome X given preparation P t is simply 

p(X\M;P t ) = ± P(X\M;P t , b ). (88) 

be{o,i} 

Similarly, the probability of the ontic state A given an 
implementation of Pt is simply 

p(MPt) = l E P(MPt,»)- (89) 

fce{o,i} 

Given the trit-obliviousness condition, Eq. (1831) . 
and the assumption of preparation noncontextuality, 
Eq. Un>, we infer that p(X\P ) = p(\\Pi) = p(X\P 2 ), 
which states that mixed preparations corresponding to 
different values of the trit t are not only indistinguish- 
able at the operational level, but at the ontic level as well. 
Therefore, even if one knew A, the posterior probabilities 
for t = 1, t = 2 and t — 3 would be the same, that is, 
one would know nothing about the trit t. The argument 
so far can be summarized as follows: for preparation- 
noncontextual models, trit-obliviousness at the opera- 
tional level implies trit-obliviousness at the ontic level. 
The ontic state A provides a classical encoding of (t, b), 
but one that does not contain any information about t. 

To finish the argument, we take note of all the func- 
tions of t and b that contain no information about t W\ 
These are equivalent, up to an affine transformation (i.e. 
up to a scalar multiple and an additive constant), to one 
of the following four functions 
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where c y (t,b) is defined in Eq. (|83[) . In an ontological 
model that respects preparation noncontcxtuality and 
the trit-obliviousness condition, the ontic state must be 
given by one of these four functions, that is, p (\\Pt.b) = 



In the sense that for any given value of the function f(t,b), the 
conditional probability p(t\f(t, b)) = 1/3 for all i. 



5\,b or 8\ lC1 ( t ,b) or S\, C2 (t,b) or Sx, C3 (t,b)- Note that in each 
case, the ontic state space is a single bit [H. 

In the case of an ontological model wherein A = b, the 
best the measurement device can do is to always output 
&ffi 1 because with probability 2/3, y ^ t and c y (t,b) = 
& © 1, while with probability 1/3, y = t and c y (t,b) — 
b. Thus, for this ontological model, the average success 
probability is 2/3. 

In the case of an ontological model wherein A = c\ (t, b), 
the best the measurement device can do is to output 
Ci(t, b) when y = 1 and C\(t,b) © 1 when i/ / 1. Note 
that a(t, b) © 1 = ciit, b) for 2/3 of the values of t, b and 
c\{t,b) © 1 = C3(t, b) also for 2/3 of the values of i, b. 
(To see this, it suffices to take the negation of the c\(t, b) 
column of the table and compare it with the C2(t, b) and 
C3(t,b) columns.) So we see that this choice of output 
generates the right correlations 2/3 of the time for y ^ 
1. Thus for this ontological model, the overall success 
probability is 7/9. 

By symmetry, the cases of A = c 2 (t,b) and A = 
cs(t, b) also achieve a success probability of at most 7/9. 
Therefore, the probability of success in a preparation- 
noncontextual ontological model is bounded above by 
7/9. 

B. Quantum case 

We now consider to what extent one can achieve the 
diachronic OS correlations in quantum theory. The fol- 
lowing is a protocol that uses a single qubit. The three 
measurements correspond to the three Pauli operators A t 
of Eq. (|4ip corresponding to directions equally spaced in 
an equatorial plane of the Bloch sphere. The positive and 
negative eigenvalues arc mapped onto outputs X = and 
X = 1 respectively. The preparation procedures P t .o and 
Pt.i correspond to the two eigenstates of At, with pos- 
itive and negative eigenvalues mapped onto b = and 
6=1 respectively. We denote these states by the Hilbert 
space vectors \(j>t,b)- The Bloch sphere representation of 
these states and measurements is provided in Fig. [TOT 
When y — t, the preparation corresponds to an eigen- 
state of the observable being measured, and the outcome 
X equals the bit b. Thus, X = c y (t,b) with probability 
1 in this case. When y ^ t, the probability of obtaining 
X = b is |(</>t,6|<?^,&)| 2 = cos 2 (7r/3) = 1/4 while the prob- 
ability of obtaining X = b © 1 and thus X — c y (t,b) is 
3/4. We have y ^ t in 2/3 of cases, so that the overall 
probability of success is 

^quantum = 5 ^ 



It is not a triple of bits and therefore cannot specify the occu- 
pation numbers of each of the three boxes. In this sense, the 
narrative device of a three-box system cannot do justice to this 
version of the parable. We must think about the preparations 
and measurements more abstractly. 
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Meanwhile, no information about t can be ob- 
tained by any quantum measurement given that 
the mixtures associated with different values of 
t are represented by the same density operator: 

3 l<?W(0o,o|+2 \M{M= I I0i,o}(0i,o| + 3 
= \ |<^2,o)(^2,o| + 2 I02,i)(02,i| = 1/2. Thus we have a vi- 
olation of the noncontextuality inequality of Eq. (|87l) . 



0i,o A\ 




01,1 



FIG. 10. Quantum states and observables used for proof of 
the impossibility of a preparation-noncontextual ontological 
model. 

Note that the OS correlations are useful for achiev- 
ing the following two-party secure computation, which 
is a kind of multiplexing. Let the two parties be called 
Alice and Bob. Alice has as input a trit t € {1,2,3} 
and a bit b e {0, 1}, each chosen uniformly at random. 
Bob has as input a trit y € {1,2,3} chosen uniformly at 
random. Bob outputs a bit c and the goal of the task is 
for Bob to output c = c y (t,b), that is, Bob should out- 
put b if y = t and the negation of b otherwise. Alice can 
send a system to Bob encoding information about her 
input, however there is a cryptographic constraint: no 
information about the trit t can be transmitted to Bob, 
which is to say that the protocol must be trit-oblivious. 
This information-theoretic manner of characterizing the 
correlations provides a connection with the discussion of 
preparation noncontextuality found in Ref. [50j . 



on the outcome, then this corresponds to a preparation 
procedure P t of the system in Babylon. In this case, the 
probability of observing an outcome X for a measure- 
ment of My in Babylon given a preparation P tj f, is pre- 
cisely equal to the probability of observing an outcome 
X for a measurement of M y in Babylon given an outcome 
b for M t in Abydos . There is an isomorphism between 
the diachronic pair of single-query 3-box systems and the 
separated pair. 

Now suppose that the Abydosian and Babylonian mea- 
surements are space-like separated. In this case, the 
no-signaling constraint ensures that the choice of t in 
Abydos cannot influence the outcome statistics of any 
measurement in Babylon and consequently that the three 
preparation procedures Pi, P2 and P3 are operationally 
equivalent, that is, MM : p(X\M;P x ) = p(X\M;P 2 ) = 
p(X\M; P3). This is the condition of trit-obliviousncss. 

Furthermore, an assumption of local causality implies 
that the choice of measurement in Abydos also cannot 
influence the distribution over ontic states for the 3-box 
system in Babylon. Denoting the ontic state of the Baby- 
lonian system by A, local causality implies p(X\Pi) = 
p{\\P2) = p(X\P3). But this is precisely the content of the 
assumption of preparation noncontextuality for the oper- 
ationally equivalent procedures Pi, P2 and P3. Therefore 
local causality justifies this assumption. 

This reasoning also shows that any local strategy 
for winning the prediction game for the separated pair 
of single-query 3-box systems implies a preparation- 
noncontextual strategy for winning the prediction game 
for the diachronic pair with the same winning probability 
F*l It follows that another way to derive the local bound 
of 7/9 for the probability of achieving the OS correla- 
tions for the separated pair, Eq. ([3"T]). is to appeal to this 
implication and the fact that the optimal preparation- 
noncontextual strategy achieves a winning probability of 
7/9 for the diachronic pair, as shown in Eq. (|57|) . 



C. Justifying preparation noncontextuality by 
locality 

As discussed in Ref. [l2[ , it is sometimes possible to jus- 
tify an assumption of preparation noncontextuality using 
Bell's assumption of local causality [33| . This is the case 
for the assumptions of preparation noncontextuality that 
appear in the derivation of the noncontextuality inequal- 
ity of Eq. ((57)) . It suffices to note that if one implements 
a measurement procedure on half of a correlated pair of 
systems and one conditions upon its outcome, then this 
procedure can also be considered a preparation procedure 
for the other half of the correlated pair. Indeed, given the 
separated pair of single-query 3-box systems considered 
in Sec. I IV Al every measurement procedure M t on the 
3-box system in Abydos chosen from t <G {1,2,3} and 
yielding outcome b £ {0, 1} corresponds to a prepara- 
tion procedure P t _b f° r the 3-box system in Babylon. If 
M t is measured in Abydos but one does not condition 



VII. JOINT MEASURABILITY OF POVMS 

As we showed early on, we cannot find a triple of pro- 
jective measurements in quantum theory that are jointly 
measurable pairwise but not triplewise. However, not all 
measurements in quantum theory are projective. The 
most general measurement is one that is associated with 
a positive operator valued measure (POVM). A POVM 
is a set of operators {Ex : X G S} such that Ex > 0, 
and J2x Ex = 1- The parameter X labels the outcomes 
of the measurement, which we assume form a discrete 



It has been shown that for every inequality on correlations be- 
tween pairs of separated measurements that is implied by the 
assumption of a local ontological model, an equivalent inequality 
for the correlations between preparations and measurements is 
implied by the assumption of a preparation-noncontextual onto- 
logical model [5ll . 
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set. If the preparation procedure preceding the measure- 
ment is represented by the density operator p, then the 
probability of outcome X is given by Tr(pEx )■ 

In this section, we consider the question of whether 
one could find a triple of non-projective measurements 
in quantum theory that are pairwise but not triplewise 
jointly measurable. As it turns out, this is indeed possi- 
ble. 

First, we adapt the definition of joint measurability to 
the case of POVMs. A pair of measurements associated 
with POVMs {E x } and {E X2 \ are jointly measurable 
iff there exists a third POVM {F Xl ,x 2 } sucn tnat E x t = 
Y,x 2 F Xi,x 2 and E 2 Xi = J^ Xl F Xi.x 2 - It is worth noting 
that the problem of mathematically characterizing jointly 
measurable observables when these are not projective is 
a subject of on-going research [l3T - fl6l |. 

We will consider two examples of such triples of 
POVMs such that any pair can be implemented jointly, 
but the triple cannot. They both make use of noisy spin 
observables. The three measurements we consider, la- 
belled by an integer k € {1,2,3}, are associated with 
POVMs {El,E*} 

E k ± = h± 7 l<j-h k} < 77 < 1, (92) 

where a — (<y x ,a y ,a z ) is the vector of Pauli spin opera- 
tors, whilst hi, fi2 and 71,3 are the three axes along which 
the spin is measured. Note that the POVM {E^E*} 
can be written as a convex combination of the projective 
spin measurement along hk — associated with the pro- 
jectors H± = ili i<j -hf. — and the trivial measurement 

{1/2, i/2}. That is, 

El = {1-^1+7^1. (93) 

This is the sense in which we can consider { , E_ } with 
i] < 1 to be a noisy version of the observable a ■ hk- 

A. Orthogonal spin axes 

Our first example of such a triple of nonprojcctive 
measurements uses noisy versions of spin operators along 
three orthogonal axes: 



hi 


= Z — 


(0,0,1), 


hi 


= X = 


(1,0,0), 


":s 


= y = 


(0,1,0). 



Proposition 8. The triple of measurements defined by 
Eqs. \92\) and {94% that is, noisy spin observables along 
three orthogonal axes, are pairwise jointly measurable iff 
V < 1/V2 ~ 0.707, but triplewise jointly measurable iff 
i] < 1/V3« 0.577. 

In other words, the condition l/\/3 < n < 1/V% is nec- 
essary and sufficient for the triple to be pairwise jointly 
measurable but not triplewise jointly measurable. 



This result is proven in Ref. [14[, but for completeness, 
we provide an independent proof in Appendix [F] For 
pedagogical reasons, we also provide a geometric picture 
in the Bloch sphere of the measurements that saturate 
these inequalities. To this end, defining the index set I C 
{1,2,3}, we introduce the (unnormalizcd Bloch) vectors 

™{x k } keT = J2 Xkflk > ( 95 ) 

fegX 

where Xj~ G { — 1, +1} and write the respective unit vec- 
tors as m {Xk } keT . 

The POVM that measures a noisy spin observable 
along the z-axis jointly with the one along the x-axis 
and that saturates rj < 1/ a/2 is of the form 

^F Xl x 2 = \n Xl x?j (96) 

where the projectors {nj^j^} are associated with Bloch 
vectors {rhx 1 x 2 } forming the vertices of a square in 
the i-x plane, depicted in Fig. 1111 Coarse-graining 
over X 2 yields the POVM {F^ = ±i + \u ■ 4} where 
s± = ±-^jZ, which is to say, a measurement of the r\- 
sharp spin observable along the z axis with 77 = -^=, 
depicted in Fig. 1111 Similarly, coarse-graining over Xi 
yields noisy spin observable associated with Bloch vec- 
tors 4 — ±-^£, which is to say along the x axis with 

r\ = Joint measurements of every other pair of spin 
axes are described similarly. 




FIG. 11. Bloch sphere representation of the joint measure- 
ment of the noisy spin observables along the x and z axes. 

The POVM that measures noisy spin observables along 
axes z,x and y jointly and that saturates 77 < 1/a/3 
is of the form {Fx 1 x 2 x 3 = j^X!X 2 x 3 } where the pro- 
jectors {Tlx 1 x 2 x 3 } are associated with the Bloch vec- 
tors {hix 1 x 2 x 3 } forming the vertices of a cube, depicted 
in Fig. Q21 Coarse-graining over X2 and X3 yields the 
POVM {Fl = \\ + \o- 4} where 4 = ±^z, which 
is to say an 77-sharp spin observable along the i axis with 
77 = 1/ a/3, also depicted in Fig. [T3] Similarly, coarse- 
graining over X\ and X% yields a noisy spin observ- 
able associated with Bloch vectors s\ = ±-^x, while 
coarse-graining over X\ and X2 yields one associated 
with s\ = ±^7=y- 

It is clear from these geometric representations that 
the reason there is a gap between the noise required for 
jointly measuring a pair and that required for jointly 
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FIG. 12. Bloch sphere representation of the joint measure- 
ment of the noisy spin observables along the x, y and z axes. 



measuring the triple is that the length of the edge of 
a cube inscribed in a sphere is less than that of a square 
inscribed in an equatorial plane of that sphere. 

Joint measurements of observables along orthogonal 
spin axes are not very useful for approximating the OS 
correlations. Indeed, defining the probability of obtain- 
ing anti-correlated outcomes when a pair of nonprojective 
measurements is implemented jointly, averaged uniformly 
over the three pairs, 



(97) 



we find the following result. 



Proposition 9. For the triple of measurements defined 
by Eqs. i9ty) and {9$ , that is, noisy spin observables 
along three orthogonal axes, the quantum probability of 
anti- correlation when a pair is measured jointly, averaged 
uniformly over the three pairs is 



R 



quantum 



(98) 



(independent of the quantum state). 



Proof. The intuitive reason is that each pair of spin ob- 
servables is unbiased. More precisely, if we coarse-grain 
over the effects in the joint POVM {F Xl x 2 } of Eq. ([96]) 
with outcomes corresponding to anti-correlation, we get 



(99) 



Therefore, for all quantum states, the probability of find- 
ing anti-correlated results is 1/2. □ 

There is consequently no bias towards anti-correlation 
and therefore this triple of measurements is not helpful 
for approximating the OS correlations. 



B. Trine spin axes 



(i.e. separated by a trine or an angle of 120°): 
hi = (0,0,1) 




These are depicted in Fig. [13] 



(100) 



Proposition 10. The triple of measurements defined by 
Eqs. \9 6 3fl and M00\) , that is, noisy spin observables along 
three equally- spaced axes in a plane, are pairwise jointly 
measurable ifn < y/3—1 ~ 0.732 05, but triplewise jointly 
measurable only if ' n < 2/3. 

In other words, the condition 2/3 < r\ < \/3 — 1 is 
sufficient for the triple to be pairwise jointly measurable 
but not triplewise jointly measurable. 

Again, the proof is provided in Appcndix[Fl but we can 
understand the result geometrically. The trine directions 
h\,fi2 and n.3 of Eq. f|100[) are indicated in Fig. [13J The 
POVM that measures a noisy spin observable along the 
fii-axis jointly with the one along the n3-axis and that 
saturates rj < \/3 — 1 is of the form 



XxXi = w A'iX 2 nA'iA' 



where 



W ++ = W-- 



x 2 } 



W-\ = W |_ = 



y/3 + 1 
V3 + 1' 



(101) 

(102) 
(103) 



and where the projectors {ILx 1 x 2 } are associated with 
Bloch vectors {rhx 1 x 2 } forming the vertices of a square, 
depicted in Fig. [T3] Coarse-graining over X2 yields the 



POVM {F± 



3j_} with 



±(V3-1) 



depicted in Fig. [TU Similarly, coarse-graining over X\ 
yields a noisy spin observable associated with Bloch vec- 
tors s\ = ± (y/3— l) 77.3. Joint measurements of every 
other pair of spin axes are described similarly. 




FIG. 13. Bloch sphere representation of the joint measure- 
ment of the noisy spin observables along trine axes hi and 

713 . 



Our second example consists of noisy versions of spin 
observables along three axes equally separated in a plane 



The POVM that measures noisy spin observables along 
axes h\,fi2 and jointly and that saturates n < 2/3 
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is of the form {F Xl x 2 x ;i = w Xl x 2 x 3 Tlx 1 x 2 x 3 } where 

= w = (implying that one never obtains 

a triplcwise coincidence in the joint measurement) while 

itf-l = w |-+ = |_ = w | = w |_ = = 

1/3 and where the projectors {Hx 1 x 2 X 3 } ar e associ- 
ated with Bloch vectors {mx 1 x 2 x 3 } forming the vertices 
of a hexagon for the six values of X\X 2 X 3 such that 
wx 1 x 2 x 3 7^ 0, as depicted in Fig. [TJ] Coarse-graining 
over X 2 and X 3 yields the POVM {F^ = \\ + \a ■ s\} 
where s± = ±§ni, depicted in Fig.[TJ] Similarly, coarse- 
graining over X\ and X3 yields a noisy spin observ- 
able associated with Bloch vectors s± = ±^n 2 , while 
coarse-graining over X\ and X2 yields one associated 



with 



53 - 



±4n3. Note that, unlike the three previous ex- 



amples, the Bloch-directions of the fine-grained (saturat- 
ing) POVM elements coincide with the Bloch-directions 
of the coarse-grained POVM elements. This is a pecu- 
liarity of geometry, and is a feature also seen in the dual 
problem of identifying pure-state ensembles that saturate 
the bounds of so-called EPR-steering inequalities [52[ • 





FIG. 14. Bloch sphere representation of the joint measure- 
ment of the noisy spin observables along trine axes hi, fi2 
and 723. 



Given the discussion in Sec. UVBl one might expect the 
trine spin observables to instantiate a better approxima- 
tion of the OS correlations. Indeed, we have the following 
proposition that supports this intuition. 

Proposition 11. For the triple of measurements defined 
by Eqs. A92\) and \100\) , that is, a triple of noisy spin 
observables along trine axes, the quantum probability of 
anti- correlation when a pair is measured jointly, averaged 
uniformly over the three pairs is 



R 



quantum 



V3 



0.63397. 



(104) 



(independent of the quantum state). 



Proof. If, in the joint measurement of Eq. (|101[) . we 
coarse-grain the two effects that correspond to anti- 
correlation, we obtain 



F, + F__ 



V3 



V3 + r 

from which the result follows trivially. 



(105) 



□ 



Can we explain this degree of anti-correlation within 
a gcneralized-noncontextual ontological model? Given 



that the measurements involved are nonprojective, we 
need not represent them as assigning deterministic out- 
comes for every ontic state. Indeed, as discussed in 
Sec. Ill Dl for nonprojective measurements, one is not war- 
ranted in assuming outcome determinism. It follows that 
the bound of 2/3 on the probability of anti-correlation, 
Eq. (|12[) , which we derived under the assumption of mea- 
surements being projective, need not apply Conceiv- 
ably, the bound implied by generalized noncontextuality 
could be smaller for nonprojective measurements, and 
the quantum degree of anti-correlation might therefore 
still violate it. As it turns out however, the bound is actu- 
ally larger for nonprojective measurements, and therefore 
the quantum degree of anti-correlation is entirely con- 
sistent with an ontological model that is measurement- 
noncontcxtual and outcome-deterministic for projective 
measurements. We show this now. 



Generalized-noncontextual models for joint 
measurements of POVMs 



Each measurement that is modeled by a POVM of the 
form of Eq. (|92j) can be considered as a convex combi- 
nation of a projective measurement and a measurement 
of the trivial two-outcome POVM {1/2, 1/2}, as seen in 
Eq. In Ref. [l8|, it is proven that within any on- 

tological model, the response function that represents a 
convex combination of measurement procedures is sim- 
ply the convex combination of the associated response 
functions. Ref. [lH also contains a proof that within 
a measurcment-noncontcxtual model, the response func- 
tion that represents each outcome of the trivial two- 
outcome POVM {1/2, 1/2} is the uniform function 1/2, 
i.e., regardless of the value of A in the ontological model, 
the two outcomes occur with equal probability. We also 
recall from Sec. Ill El that in models of quantum theory, 
preparation noncontextuality implies outcome determin- 
ism for projective measurements. From these facts, we 
obtain the following result. 

Lemma 12. In an ontological model that is generalized- 
noncontextual, the response function for the rj-sharp spin 
observable of Eq. i9ty) . denoted by A4 k , is 



p (X k \M k ; A) = v [X k (A)] + (1 - i?) Q[0] + , 

(106) 

where [X(\)] denotes the response function p(X\\) = 1 
if X = X(X) and otherwise. 

This yields a strong constraint on the response function 
for the joint measurement, denoted Aii 2 , of 77-sharp spin 
observables along distinct axes. The joint response func- 
tion p (Xi, X 2 \Mi2] A) must yield p (Xi\Mi; A) when av- 
eraged over X 2 and p (X 2 \M. 2 , A) when averaged over X x . 
The most general form that can recover these marginals 
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is 

p(X 1 ,X 2 \M 12 ;X) = a[Xi(A)][X 2 (A)] (107) 
+ (3[X 1 (\)] Q[0] + i[l]) 

+ 7 0[O] + i[l]) [* 2 (A)] 

+ [o] + ^[i][i]) 

+ eQ[0][l] + i[l][0]). 

where the marginals are 

p(X 1 |M 12 ;A) = (a + /3)[X 1 (A)] (108) 

+ (7 + « + e)Q[0] + i[l]) 
p(X 2 |M 12 ;A) = ( a + 7 )[X 2 (A)] (109) 

+ (/? + <5 + £ ) Qm + ^w)- 

so that we require 

a + f3 = a + j = i], (110) 
~/ + 8 + e = f3 + 8 + e=l-r). (Ill) 

We infer that j3 = 7. 

In order to give the model the best chance of reproduc- 
ing the operational statistics, we consider what values 
of a, /?, 7, S and e achieve the largest possible amount 
of anti-correlation. The S terms always yields correla- 
tion, while the /? and 7 terms yield correlation as of- 
ten as anti-correlation. Only the a and e terms can 
have anti-correlation more frequently than correlation. 
Thus, to maximize the amount of anti-correlation, one 
sets /3 = 7 = S = 0. It then follows that a = 77 and 
e = 1 — rj. 

The same reasoning applies for the joint measurements 
of Mi and M3 and of M 2 and M3, so that for all i, j £ 
{1, 2, 3} such that i ^ j, 

p(Xi,Xj\Mij;\) = V [X t (X)][X,(X)] (112) 

+ (!-»?) Q[0][l] + ^[1] [0]). 

The question then arises of how much anti-correlation 
one can have on average for a pair of measurements 
(assuming the pair is chosen uniformly at random), 
that is, what is the upper bound on R$ of Eq. (pJT|)? 
For every A, at most two out of the three prod- 
ucts [X!(A)][X 2 (A)], [X X (A)][X 3 (A)] and [X 2 (X)][X 3 (X)] 
can yield anti-correlation, so the probability of anti- 
correlation for the r\ term is at most 2/3. Meanwhile, 
the \ — t) term always yield anti-correlation. Therefore, 

R*<v(f\ +0--V) = i-~- (113) 



One might have expected that the ability to add noise 
to the response function in the ontological model would 
not help explain a high degree of anti-correlation, but 
such an expectation fails to take into account the fact 
that the noise can itself be anti-correlated and thereby ex- 
plain more anti-correlation in the statistics. Thus rather 
than only being able to explain a probability of anti- 
correlation of 2/3 in a generalized- noncontextual model, 
we can explain a probability of anti-correlation of 1 — ^ 
which is always greater than 2/3 because rj < 1. For 
instance, for n = l/v2, the upper bound on R3 is 
1 - 1/(3^2) ~ 0.76430, while for 77 = \/3 - 1, it is 
(4- V3)/3 ~ 0.75598. 

Because the degree of anti-correlation we found in 
quantum theory was less than 2/3 in both exam- 
ples, there is no problem providing a generalized- 
noncontcxtual model. More precisely, the degree of quan- 
tum anti-correlation obtained in the example with or- 
thogonal spin axes can be explained noncontcxtually be- 
cause i?3 Uan um = 1/2 < 0.76430, and the degree ob- 
tained in the example with the trine spin axes can be ex- 
plained noncontextually because R^ uantum = 0.63397 < 
0.75598. 

Is it the case that for all triples of nonprojective quan- 
tum measurements that can be implemented pairwisc 
but not triplcwise, the strength of anti-correlations can 
be explained by a generalizcd-noncontcxtual ontological 
model? The question remains open, but we expect a pos- 
itive answer. 



VIII. CONCLUDING REMARKS 

There has been a lot of work in recent years on "foils to 
quantum theory" , operational theories that one studies 
not primarily as competitors to quantum theory, but as 
useful tools for getting a handle on the principles underly- 
ing it. Only by situating quantum theory in a landscape 
of possible theories does it make sense to speak of the 
principles that pick it out, to answer Wheeler's question: 
"how come the quantum?" . Specker's parable provides 
an interesting new kind of foil, because the kind of com- 
plementarity it exhibits — three measurements that can 
be implemented jointly pairwise but not triplewise - is 
something that is not found among projective measure- 
ments in quantum theory. This prompts the question: 
why does quantum theory not have this sort of comple- 
mentarity? It might be interesting, for instance, to de- 
duce the information-processing power of a foil theory 
incorporating such correlations. Furthermore, even if we 
consider a kind of complementarity that can be accom- 
modated in quantum theory, such as five measurements 
that can be measured in adjacent pairs, there is an in- 
teresting question about why the correlations exhibited 
by quantum theory are not stronger. Why is quan tum 
theory not more contextual or more nonlocal [l7l. l53l - [62l |? 
The same sort of question arises for quantum examples of 
triples of nonprojective measurements that can be implc- 
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merited pairwise but not triplewise. Why can these not 
yield the strength of anti-correlations required to obtain 
a no-go theorem for generalized noncontextuality? We 
hope that these questions might provide a new angle on 
the problem of deriving the structure of quantum theory 
from within a landscape of operational foil theories. 
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Appendix A: Explicit form of OS correlations in the 
double-query, 3-box system 



Perfect negative correlation in the outcomes of the 
joint observables ^12,^13 and A^23 constrains their 
statistics to be of the form 



p(0,l\M 12 ;V*) = qi2, 
p(0,l\M 13 ;V*) = qia, 



p(l,0\M 12 ;V*) 
p(l,0\M W ]V*) 
p(i,0|A4 23 ;P*) 



1 - 912, 
1 - 913, 
= 1 - 923 



(Al) 



for < 912,913,923 < 1. This fixes the statistics for the 
individual measurements Mi, M.% and M3 through the 



marginals of Eq. (|A1|) . Specifically, 

p(0\Mi;V.) = £>(0, X 2 \Mi 2 ;P*) = 912, (A2) 

x 2 

= 5>(0,* 3 |Mi 3 ;7\) =913, (A3) 
x 3 

p{Q\M 2 ;V*) = ^p{X Xt Q\Mvi;V*) = 1 - 912, (A4) 

= 5>(0,*3|-M 23 ;P*) = <?23 ! (A5) 

x 3 

p(0\M 3 ;V*) = J2p( x u q \M 13 ;V*) = 1 - 913, (A6) 



Xi 



= 5>(*2,o|A4 2 3;:p*) = 1 -923 (A7) 

x 2 

All together, we find that we must have 

1 

912 — 913 — 923 — -, 

which implies that the correlations are of the form of 
Eq. ©. 



Appendix B: Proof of theorem [6] 

Here, we provide the proof of theorem |6] as follows. 

Proof. By measurement noncontextuality, the response 
function depends only on the equivalence class of a mea- 
surement procedure. By outcome determinism, the re- 
sponse function for every measurement M. s is determin- 
istic, so that p (Xs\A4s', X) € {0, 1} . In particular, this 
is true for singleton sets. It follows that 



P 



(X s \M s ;X) = l[p(X s \M s ;X). (Bl) 



s£S 



We can then define a joint distribution p (X±..Xn\X) 
yielding the correct marginals by the product of the single 
measurement response functions, 



N 



■ (Xi...X i v|A) = ]Jp(X s \M a ;X) 



(B2) 



By assumption of the empirical adequacy of the ontolog- 
ical model, there exists a distribution p(X\V) for all V, 
such that 



d\ P (X S \M S , X) P (X\V) = p (X S \M S ; V) . (B3) 



Using p(X\V) : we can define 



p{X 1 ..X N \V)= d*p(Xi...X N \\)p(\\V) (B4) 
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which has marginal on X$ of 

p{X s \V)= p{Xi-Xn\V) 

= fd\ p{Xx...X N \\)p{X\V) 

X v :v$S 

dXp(X s \M s ,X)p(X\P) 

= p(X s \Ms;V) (B5) 

We have therefore shown that p{X\.,.X^\V) is a joint 
distribution whose marginals yield the operational statis- 
tics of all measurements. □ 



Appendix C: Maximal quantum violation of the 
n-box-set Klyachko-type Kochen-Specker inequality 

To see that S% ullntunl given in Eq. {27} is indeed the 
strongest possible quantum violation of inequality (|22[) , it 
suffices to consider the following polynomial of Hcrmitian 
operators 



Eq. (|C2p become a sum of non-negative Hermitian oper- 
ators and hence non-negative. As a result, the smallest 

' / 4 cos — 

eigenvalue of B n is lower bounded by n I 1 \- 

\ l+cos — 

\ n 

therefore making ^qua 11 * 11111 given in Eq. (|27[) the strongest 
possible quantum violation of inequality 



B n — X a X a(S i, 



(CI) 



Appendix D: Explicit form of correlations in the 
separated pair of single-query 3-box systems 

Here, we will give a simple proof that after taking into 
account of the no-signaling condition, the nonlcoal OS 
correlations have to take the form of Eq. (|3"5j) . 

First, note that by virtue of satisfying Eq. (|3"3")l . the 
nonlocal OS correlations may be written as 

Va^fe: p(0,l\M a ,M b ;V*) = q ab 
p(l,0\M a ,M b ;V*) = l-q ab 

Va = b: p(0,0\M a ,M b ;V*)=q ab 

p(l,l\M a ,M b ;V*) = l-q ab , (Dl) 

where < q ab < 1- These joint probabilities are depicted 
in Table |U 



a=l 



and note that for arbitrary Hermitian operators satisfy- 
ing the commutation relation [X a ,X a 0i] = 0, we have 



B n - n 1 



4cos^ 

1 + COS ? 



a=l 



1 7T 

- sec - V X a X a(i 

A 71 Z ✓ 







B 1 


B 2 


B 3 






1 


1 


1 


A 1 





9ii 


912 


913 




1 


1-gn 


I-912 


1-913 


A 2 





<72i 


922 


923 




1 


1-921 


1-922 


1-923 


A 3 





?3i 


g32 


933 




1 


1-931 


1-932 


1-933 



1 7T 1 " 

_ (l + scc E J „$„„ + X "Aj v kj > ( C2 ) 



fe=l,2 3 = 1 



where 



do = n (3 - 2 sec 2 i + ^ X a X affi i, 

a— 1 

n n 
a— 1 a— 1 

cj„ = c _l27r / n is the n-th root of unity, X2n = 0, and 



TABLE I. Joint conditional probability distributions 
p(A a ,B b \Ma,Mf,r) of Eq. ((Dl]) for all pairs of values of 
a and b. Along the horizontal (vertical) are the three choices 
of measurement on the B(A) wing together with the two out- 
comes for each. 

Now we consider the consequences of the no-signaling 
conditions of Eq. (f34|) . From the independence on b of 
J2 Bb p{A a ,B b \M a ,M b ;V), we deduce that 



<7al — <7a2 — 9a3, 



(D2) 



t . 2tt7 tt\ . 2 vrj . 

1 + cos sec — sin — , j — 1, 

n n I n 



A 



1 / 27TJ 7T 

2i ■ = T 1 + cos sec - 

4 V n n 



j = l,...,n- 1. 



It is straightforward to check that Aij and A2j are non- 
negative for all j. Thus, for dichotomic Hcrmitian observ- 
ables that also satisfy (X a ) = 1, the right-hand-side of 



implying that the joint distributions can be made to de- 
pend on just three parameters, which we will denote by 
s a = q ab - It then follows from the independence on a of 
J2 Aa P(Aa,B b \M a ,M b ;V) that 

Sa = 1 - s a , (D3) 

which implies that s a = \ for all a, and therefore q ab = \ 
for all a, b. It follows that the nonlocal OS correlations 
must be of the form given in Eq. (|35[) if they arc to be 
non-signaling. 
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Appendix E: Maximum quantum violation of the 
n-box-set Bell-Mermin inequality 

In general, the problem of determining the maximal 
quantum violation of a Bell ine qual ity is highly non- 
trivial (see, for example, Refs. [23l l63l lo4| and references 
therein). Here, we will show that j^q uantum defined in 
Eq. (|46[) is indeed the maximal winning probability, cf. 
Eq. (|43p . allowed in quantum mechanics. To this end, it 
suffices to show that the winning probability R n is up- 
per bounded by jjq uantum i n quantum theory. For conve- 
nience, we will show this in terms of 

n 

a—l a,b:b— a©l 



.6:a=bffil 



which can be re-expressed as: 



S n = 6n \ R 



2 / ' 



(El) 
(E2) 



using Eq. 

Now, consider the Bell operator [65| corresponding to 
above expression defining S n : 

n 

Snlos = E A a B a — ^2 A a B(, — ^ A a Bb. 

a=l a.h:b=Q©l a,b:a=b@l 

(E3) 

Following a procedure very similar to that described in 
Sec. Ill of Ref. (6(| (see also Ref. [HI), one finds that for 
arbitrary Hcrmitian observables {^4 a }a=i an d {B b }b=i 
satisfying [^4 a ,Bb] = 0, 

n\ n +i i — B^ios 

2 

= ^ E (^H±l + ^ V l- V a- + - A a ^ v\ + V a+ 

( n n \ 



* 2 



6=1 



, (E4) 



where 



1 " 2tt 
«± = -s= Y,< k (A k ±B k ), A a = l-2cos— a, 



k=l 



and oj„ = G -^/ n . 

It is easy to verify that 

max A a = A k+i = 4 cos 2 

ae{l,2,...,n} ~2~ 



2n 



n 
(E5) 



(E6) 



Thus, Eq. (|E4j) implies that whenever the constraints 

{Aa) = 1 and {St) = 1 are satisfied for all a,b £ 
{1,2, ... ,n}, the right hand side of Eq. (|E4|) becomes a 
sum of squares of polynomial of Hcrmitian operators and 
hence n (4 cos 2 ^ — l) 1 — slSos > 0. As a result, the 



maximal quantum mechanical expectation value of SjSos 
is upper bounded by n (4 cos 2 £- — l), so is the maximal 
value of S n allowed in quantum theory. 

Equivalently, it follows from Eq. (|E2[) that in quan- 
tum theory, the maximal winning probability R" is upper 
bounded by: 



12 7T 

+ — x n ( 4 cos z — 1 ) = - + - cos 2 — . (E7) 



1 1 

2 + 6n 



2n 



3 3 2n 



which is just i?£ 



given in Eq. (|46p. 



Appendix F: Necessary and sufficient conditions for 
joint measurability of noisy spin observables 

Theorem 13. Consider a set of noisy spin observables 
along the axes n k , that is, a set of POVMS {E k Xk } with 
X k 6 {+1, -1} of the form 



E x k = 2* + \ S ' ^nk- 



Defining 2 different 3-vectors 



m Xl ...x„ 



N 

^X k h k , 



(Fl) 



(F2) 



fc=i 



a necessary condition for the spin observables to be jointly 
measurable is that 



V 



< 



1 \rn Xl ...x N 

N Y, Xl ...x N \in>xi...x* 

and a sufficient condition is that 

2 N 



T, Xl ...X N I"*Xi...Xjn 



Proof. Clearly, 



n = Tr[(a-X k h k )E Xk ] 



(F3) 



(F4) 



(F5) 



But given that this equality holds for both values of X k 
and for all k, we have 



1 N 



(F6) 



fc=i x k 



Recall that joint measurability of POVMs {E Xk } for 
different k implies the existence of another POVM 
{E XiX2 ... Xn } such that 



Xk 



E E Xl ... Xl , 

Xi...X N , fix X k 



(F7) 



Consequently, 



2N 



Xi ...Xn 



N 



a -^2x k h k E Xl ... X] , 



k=l 



(F8) 
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Defining rhx 1 ...x N as above, we may write r\ as 



while 



2^ Y \™x 1 ...x N \Tr[(a-m Xl ...x N )E Xl ...x N }- 



N 



Xi...X]\ 



(F9) 

where rh Xl ...x N = m Xl ...x N /\mx 1 ...x N \- We then note 
that 



Tr[(a-m Xl ...x N )E Xl ... XN ] < Tr [E Xi ... Xn ] (F10) 
to obtain the inequality 

7 ^Vn E l^ 1 ...x„|Tr[£; Xl ..^ JV ]. (Fll) 



Y m Xl ...x N = Y \Y X ^ 

Also, note that because Wx 1 ...Xfc...X lv = 
-m_xi»— Xfc-- Xjv! ^ follows that \rh Xl ...x N \ = 
|m_jCj x N I and consequently that 



Xi...Xp 



Y \rtxi...Xs\ = 2 £ 



(F14) 



We need only determine the maximum value of the right- 
hand-side in a variation over all POVMs {Ex x ...x N } ■ 
Given that ^2 X x Ex 1 ...x N = 1, we know that 
T,x 1 ...x N Tr [ E Xr...x N ] = 2. Consequently, the 2 N - 
dimensional vector (^Tr [Ex 1 ...x N ]) Xi Xn has unit 1- 

norm. Thinking of Y,x x ...x N \ m Xi...x N |Tr [E Xl ...x N ] as 
a scalar product, we see that it is maximized by taking 
(■|Tr [E Xi ,,, Xn ]^ x x to be a unit vector parallel to 

QA Xi ,,, Xn \) Xi Xn with unit 1-norm, that is, 



-Tr[E Xl ... XN ] 



\mx l ...x„ 



Ex[...x> N \mx[...x^ 



(F12) 



which yields the necessary condition on 77, Eq. (|F3[) . 

To derive the sufficient condition, Eq. (|F4|) . we con- 
struct a POVM that jointly measures a set of spin ob- 
servables with value of rj saturating the inequality. Any 
set of observables with smaller rj can then be jointly mea- 
sured by simply adding uniformly random noise to this 
POVM.' 

The simulating POVM is 



E 



Xi...X]\ 



T,X[...X' N \™x{...x> N 



1 - 1 ^ 

-1 + -a- m x ,, Jf 



(F13) 

It suffices to demonstrate that this is indeed a POVM 
and that it coarse-grains to the appropriate noisy spin 
observables. First, note that 



N 

Y m Xl ...x N = Y \Y X i fl i 

X 1 ...X N X ± ...X N \j = l 



N / x 

Y h A Y x i 

3=1 \X 1 ...X N 



It is then easy to verify that 



Y * 



Xi...Xn 



Xi...Xj\ 



Y 

X\...Xi 



1\fhx 1 ...x t , 



Ex[...x' N \r*xi...x>, 



and that 

Y E *~ 

{X t }^ k 



Xn 



1 - 1 ^ 

-1 + -a ■ m Xl ...x f 



2\rn Xl ...x N \ 
{X~}^ k ^x[...x' N \™x[...x> N 



1 - 1 ^ 

-1 + -a ■ m Xx ...x t 



Y 



1 « 1 

-l + -o- 



:X k n k . 



2 2 Ex[...x' N \™xl...x> N \ 

This establishes the sufficient condition. □ 

Corollary 14. The necessary and sufficient conditions 
for joint measurability of a set of spin observables are: 
for a pair of orthogonal spin axes, 



1 



for a triple of orthogonal spin axes, 
for a pair of trine spin axes, 

V < Vz-i, 

for a triple of trine spin axes, 

2 



(F15) 



(F16) 



(F17) 



(F18) 



To saturate each of these inequalities, it suffices to im- 
plement the POVM defined in Eq. IFW\) . 
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Proof. We begin by establishing the values of 
1} f° r ea ch of our four examples. For 
orthogonal spin axes, defined in Eq. (|94|) . we have for 
N = 2, 

VX U X 2 : \rh XlX2 \ = + X 2 h 2 \ = y/2, (F19) 

and for N = 3, 

VX 1 ,X 2 ,X 3 :\m XlX2Xa \ (F20) 
= \X\h-i + X 2 fi2 + X 3 h 3 \ = v3. 

For trine spin axes, defined in Eq. (|100l) . we have for 



N = 2, 

|m ++ | = |m__| = l, (F21) 
|m+_| = |m_+| = V3, (F22) 

and for N = 3, we have (making use of the fact that 
hi + n,- = — hk for i, j, k distinct), 

\m +++ \ = \m 1=0, (F23) 

|m ++ _| = |m— +| = |m+_+| 

= = |m_++| = = 2. (F24) 

It is then straightforward to verify in each case that the 
necessary and sufficient conditions on r\, Eqs. (|F3j) and 
(|F4[) . coincide and yield the bounds given in Eqs. (|F15[> - 
(|F18|) . We have already shown that the bound of the 
sufficient condition is saturated by the POVM of Eq. 
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